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(57) Abstract 

The invention relates to the transcriptional regulatory sequence (TRS) of careinoembryonic antigen (CEA) and molecular chimaera 
comprising the CEA TRS and DNA encoding a heterologous enzyme. CEA TRS is capable of targeting expression of the heterologous 
enzyme to CEA+ cells and the heterologous enzyme is preferably an enzyme capable of catalysing the production of an agent cytotoxic or 
cytostatic to CEA* cells. For example the enzyme may be cytosine deaminase which is capable of catalysing formation of the cytotoxic 
compound 5-fluorouracll from the non toxic compound 5-fluorocytosine. 
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TRANSCRIPTIONAL REGUL ATORY SEQ UENCE OF CARCINOEMBRYON1C ANTIGEN FOR 
EXPRESSION TARGETING 

The present invention relates to a transcriptional regulatory sequence useful in 
gene therapy. 

Colorectal carcinoma (CRC) is the second most frequent cancer and the second 
leading cause of cancer-associated deaths in the United States and Western Europe. The 
overall five-year survival rate for patients has not meaningfully improved in the last three 
decades. Prognosis for the CRC cancer patient is associated with the depth of tumor 
penetration into the bowel wall, the presence of regional lymph node involvement and, 
most importantly, the presence of distant metastases. The liver is the most common site 
for distant metastasis and, in approximately 30% of patients, the sole initial site of tumor 
recurrence after successful resection of the primary colon cancer. Hepatic metastases are 
the most common cause of death in the CRC cancer patient. 

The treatment of choice for the majority of patients with hepatic CRC metastasis 
is systemic or regional chemotherapy using 5-fluorouracil (5-FU) alone or in combination 
with other agents such as leviamasole. However, despite extensive effort, there is still no 
satisfactory treatment for hepatic CRC metastasis. Systemic single- and combination- 
agent chemotherapy and radiation are relatively ineffective emphasizing the need for new 
approaches and therapies for the treatment of the diseases. 

A gene therapy approach is being developed for primary and metastatic liver 
tumors that exploits the transcriptional differences between normal and metastatic cells. 
This approach involves linking the transcriptional regulatory sequences (TRS) of a tumor 
associated marker gene to the coding sequence of an enzyme, typically a non- 
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mammalian enzyme, to' create an artificial chimaeric gens 
that is selectively expressed in cancer cells. The enzyme 
should be capable of converting a non-toxic prcdrug into a 
cytotoxic or cytostatic drug thereby allowing for selective 
5 elimination of metastatic cells. 

The principle of this approach has been demonstrated 
using an alpha-f etoprotein/Varicella Zoster virus thymidine 
kinase chinaera to target hepatocellular carcinoma with the 

10 enzyme metabolically activating the non-toxic prodrug 6- 
methoxypurine arabinonucleoside ultimately leading tc 
formation of the cytoxic anabolita adenine 
arabinonucleoside triphosphate (see Huber et al . Proc. 
Natl. Acad. Sci U.S.A., 33/ 8039-8043 (1991) and EP-A-0 415 

15 731) . 

For the treatment of hepatic metastases of CRC, it is 
desirable to control the expression of an enzyme with the 
transcriptional regulatory sequences of a tumor marker 
20 associated with such metastases. 

CEA is a tumor associated marker that is regulated at 
the transcriptional level and is expressed by most CRC 
tumors but is not expressed in normal liver. CEA is widely 

25 used as an important diagnostic tool for postoperative 
surveillance, chemotherapy efficacy determinations, 
immunolocalisation and immunotherapy. The TRS of CEA are 
potentially of value in the selective expression of an 
enzyme in CEA* tumor cells since there appears to be a very 

30 low heterogeneity of CEA within metastatic tumors, perhaps 

because CEA may have an important functional role in 
metastasis. 

The cloning of the CEA gene has been reported and the 
35 promoter localised to a region of 424 nucleotides upstream 
from the translati nal start (Schrewe et al, Mol. Cell. 
Biol., i0, 2738 - 2748 (1990) but the full TRS was not 
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identified. 

In the work on which the present invention is based, CEA genomic clones have 
been identified and isolated from the human chromosome 19 genomic library LL19NL01, 
ATCC number 57766, by standard techniques described hereinafter. The cloned CEA 
sequences comprise CEA enhancers in addition to the CEA promoter. The CEA 
enhancers are especially advantageous for high level expression in CEA-positive cells and 
no expression in CEA-negative cells. 

According to one aspect, the present invention provides a DNA molecule 
comprising the CEA TRS but without associated CEA coding sequence. 

According to another aspect, the present invention provides use of a CEA TRS 
for and targeting expression of a heterologous enzyme to CEA* cells. Preferably the 
enzyme is capable of catalysing the production of an agent cytotoxic or cytostatic to the 
CEA* target cells. 

As described in more detail hereinafter, the present inventors have sequenced a 
large part of the CEA gene upstream of the coding sequence. As used herein, the term 
"CEA TRS" means any part of the CEA gene upstream of the coding sequence which 
has a transcriptional regulatory effect on a heterologous coding sequence operably linked 
thereto. 

Certain parts of the sequence of the CEA gene upstream of the coding sequence 
have been identified as making significant contributions to the transcriptional regulatory 
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effect, more particularly increasing the level and/or selectivity of transcription. 
Preferably the CEA TRS includes all or part of the region from about -299b to about 
+69b, more preferably about -90b to about +69b. Increases in the level of transcription 
and/or selectivity can also be obtained by including one or more of the following regions: 
-14.5kb to -10.6kb, preferably -13.6kb to -10.6kb, and/or -6.1kb to -3.8kb. All of the 
regions referred to above can be included in either orientation and in different 
combinations. In addition, repeats of these regions may be included, particularly repeats 
of the -90b to +69b region, containing for example 2,3, 4 or more copies of the region. 
The base numbering refers to the sequence of Figure 6. The regions referred to are 
included in the plasmids described in figure 5B. 

Gene therapy involves the stable integration of new genes into target cells and the 
expression of those genes, once they are in place, to alter the phenotype of that particular 
target cell (for review see Anderson, W.F. Science 226, 401-409 (1984) and 
McCormick, D. Biotechnology 3, 689-693, (1985)). Gene therapy may be beneficial for 
the treatment of genetic diseases that involve the replacement of one defective or missing 
enzyme, such as; hypoxanthine-guanine phosphoribosyl transferase in Lesch-Nyhan 
disease, purine nucleoside phosphorylase in severe immunodeficiency disease, and 
adenosine deaminase in severed combined immunodeficiency disease. 

It has now been found that h is possible to selectively arrest the growth of, or 
kill, mammalian carcinoma cells with prodrugs, i.e. chemical agents capable 
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of selective conversion to cytotoxic (causing cell death) 
or cytostatic (suppressing cell multiplication and growth) 
metabolites. This is achieved by the construction of a 
molecular chimaera comprising a "target tissue-specific" 
5 TRS that is selectively activated in target calls, such as 
cancerous cells, and that controls the expression of a 
heterologous enzyme. This molecular chimaera may be 
manipulated via suitable vectors and incorporated into an 
infective virion. Upon administration of an infective 

10 virion containing the molecular chimaera to a host (e.g., 
.ma mm al or human) , the enzyme is selectively expressed in 
the target cells. Administration of prodrugs (compounds 
that are selectively aetabolisec by the enzyme into 
metabolites that are either further metabolised to or are, 

IS in fact, cytotoxic or cytostatic agents) can then result in 
the production of the cytotoxic or cytostatic agent in situ 
in the cancer cell. According to the present invention CEA 
TRS provides the target tissue specificity. 

20 ' Molecular chimaeras (recombinant molecules comprised 

of unnatural combinations of genes or sections of genes) , 
and infective virions (complete viral particles capable of 
infecting appropriate host cells) are well known in the art 
of molecular biology. 

25 

A number of enzyme prodrug combinations may be used 
for the above purpose, providing the enzyme is capable of 
selectively activating the administered compound either 
directly or through an intermediate to a cytostatic or 

30 cytotoxic metabolite. The choice of compound will also 
depend on the enzyme system used, but must be selectively 
metabolised by the enzyme either directly or indirectly to 
a cytotoxic or cytostatic metabolite. The term 
heterologous enzyme, as used herein, refers to an enzyme 

35 that is derived from or associated with a species which, is 
different from the host to be treated and which will 
display the appropriate characteristics of the abov 
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mentioned selectivity. In addition, it will also be 
appreciated that a heterologous enzyme may also refer to ar. 
enzyme that is derived from the host to be treated that has 
been modified to have unique characteristics unnatural tc 
5 the host. 

The enzyme cytosine deaminase (CD) catalyses the 
deamination of cytosine to uracil. Cytosine deaminase is 
present in microbes and fungi but absen- in higher 

10 eukaryotes. This enzyme catalyses the hydrolytic 

deamination of cytosine and 5-flucrocytosine (5-FC) tc 
uracil and 5-f luorouracil (5-FU) , respectively. Since 
mammalian cells do not express significant amounts of 
cytosine deaminase, they are incapable of converting 5-FC 

IS to the toxic metabolite 5-FU and therefore 5-f luorocytosine 
is nontoxic to mammalian cells at concentrations which are 
effective for antimicrobial activity. 5-Fluorouracil is 
highly toxic to mammalian cells and is widely used as an 
anticancer agent. 

20 

In mammalian cells, some genes are ubiquitously 
expressed. Most genes, however, are expressed in a 
temporal and/ or tissue-specific manner, or are activated in 
response to extracellular inducers. For example, certain 

25 genes are actively transcribed only at very precise times 
in ontogeny in specific cell types, or in response to some 
inducing stimulus. This regulation is mediated in part by 
the interaction between transcriptional regulatory 
sequences (for example, promoter and enhancer regulatory 

30 DNA sequences) , and sequence-specific, DNA-binding 
transcriptional protein factors. 

It has now been found that it is possible to alter 
certain mammalian cells, e.g. colorectal carcinoma cells, 
35 metastatic colorectal carcinoma cells and hepatic 
colorectal carcinoma cells to selectively express a 
heterologous enzyme as hereinbefore defined, e.g. CD. This 
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is achieved by the construction of molecular chimaeras In 
an expression cassette. 

Expression cassettes themselves are well known in the 
5 art of molecular biology. Such an expression cassette 

contains all essential DNA sequences required for 
expression of the heterologous enzyme in a mammalian cell. 
For example, a preferred expression cassette will contain 
a molecular chimaera containing the coding sequence for CD, 
10 an appropriate polyadenyiatipn signal for a mammalian gene 

(i.e., a pclyadenylation signal that will function in a 
mammalian cell) , and CZA enhancers and promoter sequences 
in the correct orientation. 

15 Normally, two DNA sequences are required for the 

complete and efficient transcriptional regulation of genes 
that encode messenger RNAs in mammalian cells: promoters 
and enhancers. Promoters are located immediately upstream 
(5') from the start site of transcription. Promoter 
20 sequences are required for accurate and efficient 

initiation of transcription. Different gene-specific 
promoters reveal a common pattern of organisation. A 
typical promoter includes an AT-rich region called a TATA 
box (which is located approximately 30 base pairs 5' to the 
25 transcription initiation start site) and one or more 
upstream promoter elements (UPEs). The UPEs are a 
principle target for the interaction with sequence-specific 
nuclear transcriptional factors. The activity of promoter 
sequences is modulated by other sequences called enhancers. 
30 The enhancer sequence may be a great distance from the 
promoter in either an upstream (5') or downstream (3') 
position. Hence, enhancers operate in an orientation- and 
position- independent manner. However, based on similar 
structural organisation and function that may be 
35 interchanged, the absolute distinction between promoters 

and enhancers is somewhat arbitrary. Enhancers increase 
the rate of transcription from the promoter sequence. It 



WO 95/14100 PCT/GB94/02S46 

7 

is predominantly the- interaction between sequence-specific 
transcriptional factors vith the UPS and enhar.car sequences 
that enable mammalian cells to achieve tissue-specific gene 
expression. The presence of these transcriptional protein 
5 factors (tissue-specific, trans -activating factors) bound 
to the DPE and enhancers (cis-acting, regulatory sequences) 
enables other components of the transcriptional machinery, 
including UNA polymerase, to initiate transcription with 
tissue-specific selectivity and accuracy. 

10 

The transcriptional regulatory sequence for CEA is 
suitable for targeting expression in colorectal carcinoma, 
metastatic colorectal carcinoma, and hepatic colorectal 
metastases, transformed cells of the gastrointestinal 

15 tract, lung, breast and other tissues. By placing the 

expression of the gene encoding CD under the 
transcriptional control of the CRC-associated marker gene, 
CEA, the nontoxic compound, S-FC, can be metabolically 
activated to S-FU selectively in CSC cells (for example, 

20 hepatic CRC cells) . An advantage of this system is that 
the generated toxic compound, 5-f luorouracil, can diffuse 
out of the cell in which it was generated and kill adjacent 
tumor cells which did not incorporate the artificial gene 
for cytosine deaminase. 

25 

In the work on which the present invention is based, 
CEA genomic clones were identified and isolated from the 
human chromosome 19 genomic library LL19NL01, ATCC number 
57766, by standard techniques described hereinafter. The 
30 cloned CEA sequences comprise CEA enhancers in 

addition to the CEA promoter. The CEA enhancers are 
especially advantageous for high level expression in CEA- 
positive cells and no expression in CEA-negative cells. 

35 The present invention further provides a molecular 

chimaera comprising a CEA TRS and a DNA sequence 
operatively linked thereto encoding a heterologous enzyme, 



WO 95/14100 



PCT/GjB94/02546 



8 

preferably an enzyme ' capable of catalysing the production 
of an agent cytotoxic or cytostatic to the CEA* cells. 

The present invention further provides a molecular 
5 chimaera comprising a D1IA sequence containing the coding 
sequence of the gene that codes for a heterologous enzyme 
under the control of a CEA TRS in an expression cassette. 

The present invention further provides in a preferred 
10 embodiment a molecular chimaera comprising a CEA TRS which 
is operatively linked tc the coding sequence for the gene 
encoding a non-mammalian cytosine deaminase (CD) . The 
molecular chimaera comprises a promoter and additionally 
comprises an enhancer. 

15 

In particular, the present invention provides a 
molecular chimaera comprising a DNA sequence of the coding 
sequence of the gene coding for the heterologous enzyme, 
which is preferably CD, additionally including an 
20 appropriate polyadenylation sequence, which is linked 
downstream in a 3' position and in the proper orientation 
to a CEA TRS. Most preferably the expression cassette also 
contains an enhancer sequence. 

25 Preferably non-mammalian CD is selected from the group 

consisting of bacterial, fungal, and yeast cytosine 
deaminase. 

The molecular chimaera of the present invention may be 
30 made utilizing standard recombinant DNA techniques. 

Another aspect of the invention is the genomic CEA 
sequence as described by Seq .IDl. 

35 The coding sequence of CD and a polyadenylation sighal 

(for example see s' q IDs r and 2) are placed in the 
proper 3' orientation to the essential CEA transcriptional 
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regulatory elements. " This molecular chimaera enables the 
selective expression of CD in cells or tissue -hat normally 
express CEA. Expression of the CD gene in m amm alian CRC 
and metastatic CRC (hepatic colorectal carcinoma 
metastases) will enable nontoxic 5-FC to be selectively 
metabolised to cytotoxic 5-FU. 

Accordingly, in a another aspect of the present 
invention, there is provided a method of constructing a 
molecular chimaera comprising linking a DNA sequence 
encoding a heterologous enzyme gene, e.g. CD, rs a CZA TRS. 

In particular the present invention provides a method 
of constructing a molecular chimaera as herein defined, the 
method comprising ligating a DNA sequence encoding the 
coding sequence and polyadenylation signal of the gene for 
a heterologous enzyme (e.g. non-mammalian CD) to a CEA TRS 
(e.g., promoter sequence and enhancer sequence). 

These molecular chimaeras can be delivered to the 
target tissue or cells by a delivery system. For 
administration to a host (e.g., mammal or human), it is 
necessary to provide an efficient in vivo delivery system 
that stably incorporates the molecular ch ima era into the 
cells. Known methods utilize techniques of calcium 
phosphate transf ection, electroporation, microinjection, 
liposomal transfer, ballistic barrage, DMA viral infection 
or retroviral infection. For a review of this subject see 
Biotechniques 6., No. 7, (1988) . 

The technique of retroviral infection of cells to 
integrate artificial genes employs retroviral shuttle 
vectors which are known in the art (Miller A.D., Buttimore 
C. Mol. Cell. Biol. £, 2895-2902 (1986)). Essentially, 
retroviral shuttle vectors (retroviruses comprising 
molecular chimaeras used to deliver and stably integrate 
the molecular chimaera into the genome of the target cell) 
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are generated using the DNA form of the retrovirus 
contained in a plassid. These plasmids also contain 
sequences necessary for selection and growth in bacteria. 
Retroviral shuttle vectors are constructed using standard 
5 molecular biology techniques well known in the art. 

Retroviral shuttle vectors have the parental endogenous 
retroviral genes (e.g., cacr . pol and env) removed from the 
vectors and the DNA sequence of interest is inserted, such 
as the molecular chimasras that have been described. The 
vectors also contain appropriate retroviral regulatory 
sequences for viral encapsidation, proviral insertion into 
the target genome, message splicing, termination and 
polyadenylaticn. Retroviral shuttle vectors have been 
derived from the Moloney murine leukaemia virus (Mo-MLV) 
but it will be appreciated that other retroviruses can be 
used such as the closely related Moloney murine sarcoma 
virus. Other DNA viruses may also prove to be useful as 
delivery systems. The bovine papilloma virus (BPV) 
replicates extrachr omcsomally , so that delivery systems 
based on BPV have the advantage that the delivered gene is 
maintained in a nonintegrated manner. 

Thus according to a further aspect of the present 
invention there is provided a retroviral shuttle vector 
comprising the molecular chimaeras as hereinbefore 
defined. 

The advantages of a retrovirai-mediated gene transfer 
system are the high efficiency of the gene delivery to the 
targeted tissue or cells, sequence specific integration 
regarding the viral genome (at the 5' and 3' long terminal 
repeat (LTR) sequences) and little rearrangements of 
delivered DNA compared to other DNA delivery. systems. 

Accordingly in a preferred embodiment of the present 
invention there is provided .a retroviral shuttle vector 
comprising a DNA sequence comprising a 5 ' viral LTR 
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sequence, a cis-actir.g psi-encapsidation sequence, a 
aolecular chiaaera as hereinbefore defined and a 3' viral 
LTR sequence. 

5 In a preferred embcdiaent, and to help eliminate non- 

tissue-specific expression of the aolecular chiaaera, the 
aolecular chiaaera is placed in opposite transcriptional 
orientation to the 5' retroviral LTR. In addition, a 
doainant selectable marker gene may also be included that 
10 is transcriptionally driven .from the 5' LTR sequence. Such 
a dominant selectable marker gene may be the bacterial 
neomycin-resistance gene NEO (aminoglycoside 3' phospho- 
transferase type II) , which confers on eukaryotic cells 
resistance to the neonycin analogue Geneticir. (antibiotic 
15 G418 sulphate; registered trademark of GIBCO) . The NEO 
gene aids in the selection of packaging cells that contain 
these sequences. 

The retroviral vector is preferably based on the 
Moloney murine leukaemia virus but it will be appreciated 
that other vectors may be used. Vectors containing a NEO 
gene as a selectable marker .have been described, for 
example, the N2 vector (Eglitis M.A. , Kantoff P., Gilboa 
E. , Anderson W.F. Science £20, 1395-1398 (1985)). 

A theoretical problem associated with retroviral 
shuttle vectors is the potential of retroviral long 
terminal repeat (LTR) regulatory sequences 
transcriptionally activating a cellular oncogene at the 
site of integration in the host genome. This problem may 
be diminished by creating SIN vectors. SIN vectors are 
self-inactivating vectors that contain a deletion 
comprising the promoter and enhancer regions in the 
retroviral LTR. The LTR sequences of SIN vectors do t not 
transcriptionally activate 5' or 3' genomic sequences. 'The 
transcriptional inactivation of the viral LTR sequences 
diminishes insertional activation of adjacent target cell 
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DNA sequences and also aids in the selected expression of 
the delivered molecular chiaaera. SIN vectors are created' 
by removal of approximately 299 bp in the 3' viral LTR 
sequence (Gilboa E. , Eglitis P. A., Xantoff P.W. , Anderson 
5 ff.F. Biotechniques £, 504-512 (193S) ) . 

Thus preferably the retroviral shuttle vectors of the 
present invention are SIN vectors. 

10 Since the parental retroviral cap ,- pol . and env genes 

have been r amoved frcn these shuztle vectors, a helper 
virus system may be utilised to provide the cac , pol . and 
env retroviral gene products jjj trans to package or 
encapsidata the retroviral vector ir.ro an infective virion. 

15 This is accomplished by utilising specialised "packaging" 
cell lines, which are capable of generating infectious, 
synthetic virus yet are deficient in the ability to produce 
any detectable wild-type virus. In this way the artificial 
synthetic virus contains a chiaaera of the present 

20 invention packaged into synthetic artificial infectious 
virions free of wild-type helper virus. This is based on 
the fact that the helper virus that is stably integrated 
into the packaging cell contains the viral structural 
genes, but is lacking the psi-site, a cis -acting regulatory 
25 sequence which must be contained in the viral genomic RNA 
molecule for it to be encapsidated into an infectious viral 
particle. 

Accordingly, in a still further aspect of the present 
30 invention, there is provided an infective virion comprising 
a retroviral shuttle vector, as hereinbefore described, 
said vector being encapsidated within viral proteins to 
create an artificial, infective, replication-defective, 
retrovirus . 

35 

In a another aspect of the pr sent invention there is 
provided a method for producing infective virions of the 
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present invention by' delivering the artificial retroviral 
shuttle vector comprising a molecular chisaera of the 
invention, as hereinbefsre described, into a packaging cell 
line. 

s 

The packaging cell line may have stably integrated 
within it a helper virus lacking a psi-sita and other 
regulatory sequence, as hereinbefore described, or, 
alternatively, the packaging cell line may be engineered so 

10 as to contain helper virus, structural genes within its 

genome. In addition to removal of the psi-sita, additional 
alterations can be made to the helper virus LTZ regulatory 
sequences to ensure that the helper virus is net packaged 
in virions and is blocked at the level cf reverse 

15 transcription and viral integration. Alternatively, helper 
virus structural genes (i.e., crag, col and env) may be 
individually and independently transferred into the 
packaging cell line. Since these viral structural genes 
are separated within the packaging cell's genoae, there is 

20 little chance of covert recombinations generating wild-type 

virus . 

The present invention also provides a packaging cell 
line comprising an infective virion, as described 
25 hereinbefore, said virion further comprising a retroviral 
shuttle vector. 

The present invention further provides for a packaging 
cell line comprising a retroviral shuttle vector as 
30 described hereinbefore. 

In addition to retroviral-mediated gene delivery of 
the chimeric, artificial, therapeutic gene, other gene 
delivery systems known to those skilled in the art can be 
35 used in accordance with the present invention. These other 
gene delivery systems include other viral gene delivery 
systems known in the art, such as the adenovirus delivery 
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Non-viral delivery systems can be utilized ir. 
accordance with the present invention as well. For 
example, liposomal delivery systems can deliver the 
therapeutic gene to the tumor site via a liposome. 
Liposomes can be modified to evade metabolism and/or to 
have distinct targeting mechanisms associated with them. 
For example, liposomes which have antibodies incorporated 
into their structure, such- as antibodies to CZA, can have 
targeting ability to CZA-positive cells. This will 
increase both the selectivity of the present invention as 
well as its ability to treat disseminated disease 
(metastasis) . 

Another gene delivery system which can be utilized 
according to the present • invention is receptor-mediated 
delivery, wherein the gene of choice is incorporated into 
.a ligand which recognizes a specific cell receutor. This 
system can also deliver the gene to a specific "cell type. 
Additional modifications can be made to this receptor- 
mediated delivery system, such as incomoration of 
adenovirus components to the gene so that the gene is not 
degraded by the cellular lysosomal compartment after 
internalization by the receptor. 

The infective virion or the packaging cell line 
according to the invention may be formulated by techniques 
well known in the art and may be presented as a formulation 
(composition) with a pharmaceutically acceptable carrier, 
therefor. Pharmaceutically acceptable carriers, in this 
instance physiologic aqueous solutions, may comprise liquid 
medium suitable for use as vehicles to introduce the 
infective virion into a host. An example of such a carrier 
is saline. The infective virion or packaging cell line «ay 
be a solution or suspension in. such a vehicle, stabilizers 
and antioxidants and/or other excipients may also be 
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present in such pharmaceutical formulations (expositions) , 
which may be administered to a mammal by any conventional 
method (e.g., oral or parenteral routes). In particular, 
the infective virion may be administered by intra-venous or 
intra -arterial infusicr.. In the case of treating hepatic 
metastatic CRC, intra-hepatic arterial infusion may be 
advantageous. The packaging cell line can be administered 
directly to the tumor cr near the tumor and thereby produce 
infective virions diracrly at or near the tumor site. 

Accordingly, the present invention provides a 
pharmaceutical formulation (composition) cosprising an 
infective virion or packaging cell line according to the 
invention in admixture vith a pharmaceutically acceptable 
carrier. 

Additionally, the present invention provides methods 
of making pharmaceutical formulations (compositions) , as 
herein described, comprising mixing an artificial infective 
virion, containing a molecular chimaera according to the 
invention as described hereinbefore, with a 
pharmaceutically acceptable carrier. 



The present invention also provides methods of making 
25 pharmaceutical formulations (compositions) , as herein 
described, comprising mixing a packaging cell line, 
containing an infective virion according to the invention 
as described hereinbefore, with a pharmaceutically 
acceptable carrier. 

30 

Although any suitable compound that can be selectively 
converted to a cytotoxic or cytotostatic metabolite by the 
enzyme cytosine deaminase may be utilised, the preferred 
compound for use according to the invention is 5-FC A in 
35 particular for use in treating colorectal carcinoma (CHC), 
metastatic colorectal carcinoma, or hepatic CRC metastases. 
5-FC, which is non-toxic and is used as an antifungal, is 
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converted by CD into the established cancer therapeutic 5- 
FU. 



Any agent that can potentiate the antitumor effects of 
5 5-FU can also potentiate the antitumor effects of 5-FC 

since, when used according to the present invention, 5-FC 
is selectively converted to 5-FU. According to another 
aspect of the present invention, agents such as leucovorin 
and levemisol, which car- potentiate the anti-umor effects 

10 of. 5-FU, can also be used in. combination with 5-FC when 5- 

FC is used according to the present inven-icn. Other 
agents which can potentiate the antitumor effects of 5-FU 
are agents which bloclc the metabolism 5-FU. Examples of 
such agents are 5-subszituted uracil derivatives, for 

15 example, S-ethynyluracil and 5-brc:avinyluracil 

(PCT/GB91/01650 (WO 92/04901} ; Cancer Research 4js, 1094, 
(198 6) which are incorporated herein by reference in their 
entirety) . Therefore, a further aspect of the present 
invention is the use ■ of an agent which can potentiate the 
20 antitumor effects of 5-FU,' for example, a 5-substituted 

uracil derivative such as 5-ethynyluracil or 5- 
bromvinyluracil in combination with 5-FC when 5-FC is used 
according to the present invention. The present invention 
further includes the use of agents which are metabolised in 
25 vivo to the corresponding 5-substituted uracil derivatives 
described hereinbefore (see Biochemical Pharmacology 28., 
2885, (1989) which is incorporated herein by reference in 
its entirety) in combination with 5-FC when 5-FC is used 
according to the present invention. 

30 

5-FC is readily available (e.g., United States 
Biochemical, Sigma) and well known in the art. Leucovorin 
and levemisol are also readily available and well known in 
the art. 

Two significant advantages of the enzyme /prodrug 
combination of cytosine deaminase/ 5-f luorocytosine and 
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further aspects of the invention are the following: 

1. The metabolic conversion of 5-FC by CD produces 5-FU 
which is the drug of choice in the treatment of many 

5 different types of cancers, such as colorectal carcinoma. 

2. The 5-FU that is selectively produced in one cancer 
cell can diffuse aut of that cell and be taken up by both 
non-facilitated diffusion and facilitated diffusion into 

10 adjacent cells. This produces a neighbouring cell killing 
effect. This neighbour cell killing effect alleviates the 
necessity for delivery of the therapeutic molecular chimera 
to every tumor cell. Rather, delivery of the molecular 
chimera to a certain percentage of tumor cells can produce 

IS the complete eradication of all tumor cells. 

The amounts and precise regimen in treating a mammal, 
will of course be the responsibility of the attendant 
physician, and will depend on a number of factors including 

20 the type and severity of the condition to be treated. 

However, for hepatic metastatic CRC, an intrahepatic 
arterial infusion of the artificial infective virion at a 
titer of between 2 x 10 s and 2 x 10 7 colony forming units 
per ml (CFU/ml) infective virions is suitable for a typical 

25 tumour. Total amount of virions infused will be dependent 
on tumour size and are preferably given in divided doses. 

Likewise, the packaging cell line is administered 
directly to a tumor in an amount of between 2 x 10 s and 2 x 
30 10 7 cells. Total amount of packaging cell line infused will 
be dependent on tumour size and is preferably given in 
divided doses. 

Prodrug treatment - Subsequent to infection with. the 
35 infective virion, certain cytosine compounds (prodrugs: . of 
5-FU) are converted by CD . to cytoxic or cytostatic 
metabolites (e.g. 5-FC is converted to 5-FU) in target 
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cells. The above mentioned prodrug compounds are 
administered to the host (e.g. mammal or human) between six 
hours and ten days, preferably between one and five days, 
after administration of the infective virion. 

The dose of 5-FC to be given will advantageously be in 
the range 10 to 500 mg per kg body weight of recipient per 
day, preferably 50 to 500 ag per kg bodyweight of recipient 
per day, acre preferably 50 to 250 mg per kg bodyweight of 
recipient per day, and aosf..pref erably 50 to 150 ag per kg 
body weighr of recipient per day. The mode of 
adainistra-cion of 5-FC in humans are well known to those 
skilled in the art. oral administration anc/or constant 
intravenous infusion of 5-FC are anticipated by the instant 
invention to be preferable. 

The doses and mode of administration of leucovorin and 
levemisol to be used in accordance with the present 
invention are well known or readily determined by those 
clinicians skilled in the art of oncology. 

The dose and mode of administration of the 5- 
substituted uracil derivatives can be determined by the 
skilled oncologist. Preferably, these derivatives are 
given by intravenous injection or orally at a dose of 
between 0.01 to 50 mg per kg body weight of the recipient 
per day, particularly 0.01 to 10 mg per kg body weight per 
day, and more preferably 0.01 to 0.4 mg per kg bodyweight 
per day depending on the derivative used. An alternative 
preferred administration regime is 0.5 to lo mg per kg body 
weight of recipient once per week.. 

The following examples serve to illustrate the present 
invention but should not be construed as a limitation 
thereof. In the Examples reference is made to the Figures 
a brief description of which is as follows: 
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Figure 1: Diagram of CEA phage clones. The 
overlapping clones lambdaCEAl, lambdaCEA7, and lambdaCEAS 
represent an approximately 2 6 kb region of CEA genomic 
sequence. The 11,288 bp HindIII-Sau3A fragment that was 
5 sequenced is represented by the heavy line under 

lambdaCEAl. The 3774 hp Hindlll -Hindlll fragment that was 
sequenced is represented . by the heavy line under 
lambdaCEA7. The bent arrows represent the transcription 
start point for CEA mRNA. The straight arrows represent 
10 the oligonucleotides CR15 and CR16. H, Hindlll; S, SstX; 

3, BamHI; E, EcoRI; X, Xbal. 

Figure 2: Restriction map of part of lambdaCEAl. The 
arrow head represents the approximate location of the 
IS transcription imitation point for CEA mRNA. Lines below 
the map represent the CIA inserts of pBS+ subclones. These 
subclones are convenient sources for numerous CEA 
restriction fragments. 

20 DNA sequence of the 11,288 bp Hindlll to 

Sau3A fragment of lambdaCEA7 (SEQ ID NO: l) . sequence is 
numbered with the approximate transcription imitation point 
for CEA mRNA as 0 (this start site is approximate because 
there is some slight variability in the start site among 

25 individual CEA transcripts) . The translation of the first 
exon is shown. Intron 1 extends from +172 to beyond +592. 
Several restriction sites are shown above the sequence. In 
subclone 109-3 the sequence at positions +70 has been 
altered by site-directed mutagenesis in order to introduce 

30 Hindlll and EcoRI restriction sites. 

DNA sequence of the 3774 bp Hind III to 
Hindlll fragment of lambda CEA7 (SEQ ID NO: 2). 

35 

Figure 3 : Mapplot of 15,056 bp Hindlll to Sau3A 
fragment from CEA genomic DNA showing consensus sequences. 
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Schematic representation of some of the consensus sequences 
found in the CEA sequence of Seq IDs 1 and 2. The 
consensus sequences shewn here are from the transcriptional 
dictionary of Locker and Buzard (DNA Sequence J., 3-11 
5 (1990)). The lysozymai silencer is coded 513. The last 

line represents 90% hcnology to the topoisomerase n 
cleavage consensus. 

Figure 4: Cloning scheme for CEA constructs extending 
10 from -299 bp to +69 bp. 

Figure 5A: Cloning scheme for CEA constructs 
extending from -10.7 kb to +69 bp. 

15 Figure SB: Coordinates for CEA sequence present in 

several CEA/ luc if erase clones. CEA sequences were cloned 
into the multiple cloning region of pGL2-Basic (Promega 
Corp - ) by standard techniques . 

20 

Figures 5C and 5D: Transient lucif erase assays. 
Transient transf ections and lucif erase assays were 
performed in quadruplicate by standard techniques using 
DOTAP (Boehringer Mannheim, Indianapolis, IN, USA) , 

25 lucif erase assay system (Promega, Madison, WI, USA) , and 
Dynatech luminometer (Chantilly, VA, USA) . CEA-positive 
cell lines included LoVo (ATCC #CCL 229) and SW1463 (ATCC 
fCCL 234) . CEA-negative cell lines included EuH7 and Hep3B 
(ATCC #HB 8064). C. Lucif erase activity expressed as the 

30 percent of pGL2 -Control plasmid activity. D. Lucif erase 

activities of LoVo and SW1463 expressed as fold increase 
over activity in Hep3B. 

Example % 

35 



Construction of transcriptional regulatory sequence of 
carcinoembrvonic ant j.aenycytpsine deaminase Tnolecular 
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Aj Cloning »rri Eolati on of the t ransc r ipts nrM r < SOT ji 
sequence of the carcincsmbrv onic antigen gene 

CEA genomic clones were identified and isolated from 
the human chromosome 13 genomic library LL19NL01, ATCC 
£51166, by standard techniques (Richards e£ al. , Cancer 
Research, 10, 1521-1527 (1990) which is herein incorporated 
by reference in its entirety). The CEA clones were 
identified by plaque hybridization to end-labelled 
oligonucleotides CR15 and CR16. CR15, 5 r. 
CCCTGTGATCTCCAGGACAGCTCAGTCTC-3 ' (SEQ ID NO: 3), and CR16, 
5 ' -GTTTCCTGAGTGATGTCTG7C-TGCAATG-3 ' (SEQ ID NO: 4),' 
hybridize to a 5' non-transcribed region of CEA that has 
little homology to other members of the CEA gene family. 
Phage DNA was isolated from three clones that hybridized to 
both oligonucleotide probes. Polymerase chain reaction, 
restriction mapping, and DNA sequence analysis confirmed 
that the three clones contained CEA genomic sequences. The 
three clones are designated lambdaCEAl, laabdaCEAS, and 
lambdaCEA7 and have inserts of approximately 13.5, 16.2, 
and 16.7 Jcb respectively, a partial restriction map of the 
three overlapping clones is shown in Figure 1. 



Clone lambdaCEAl was initially chosen for extensive 
analysis. Fragments isolated from lambdaCEAl were subcloned 
using standard techniques into the plasmid pBS+ (Stratagene 
Cloning Systems, La Jolla, CA, USA) to facilitate 
sequencing, site-directed mutagenesis, and construction of 
chimeric genes. The inserts of some clones are represented 
in Figure 2. The complete DNA sequence of a 11,288 bp 
Hindlll/ Sau3A restriction fragment from lambdaCEAl ( 

SEQ ID NO: 1) was determined by the dideoxy sequencing 
method using the dsDNA Cycle Sequencing System from Bife 
Technologies, inc. and multiple oligonucleotide primers. 
This sequence extends from -10.7 Jcb to +0.6 kb relative to 
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the start site of CEA mRNA. The sequence of 3774 base 
pair Hindlll restriction fragment from lambdaCEAl was also 
determined ( SEQ ID NO: 2). This sequence extends 

from -14.5 kb to -10.7 kb relative to the start site of CEA 
5 mRNA. This Kindlll fragment is present in plasmid pCRi45. 

To determine iaccrrant transcriptional regulatory 
sequences various fragments of CEA genomic DNA are linked 
to a reporter gene such as lucif erase ' or chloramphenicol 

.10 acetyl transferase. Various fragments of CEA genomic DNA are 
tested to determine the optimized, cell-type specific TRS 
that results in high level reporter gene expression in C Ex- 
positive cells but not in CEA-negative cells. The various 
reporter constructs, along with appropriate controls, are 

15 transfected into tissue culture cell lines that express 

high, low, or no CEA. The reporter gene analysis identifies 
both positive and negative transcriptional regulatory 
sequences. The optimized CEA-specific TRS is identified 
through the reporter gene analysis and is used to 

20 specifically direct the expression of any desired linked 

coding sequence, such as CD or VZV TK, in cancerous cells 
that express CEA. The optimized CEA-specific TRS, as used 
herein, refers to any DNA construct that directs suitably 
high levels of expression in CEA positive cells and low or 

25 no expression in CEA-negative cells. The optimized CEA- 

specific TRS consists of one or several different fragments 
of CEA genomic sequence or multimers of selected sequences 
that are linked together by standard recombinant DNA 
techniques. It will be appreciated by those skilled in the 

30 art that the optimized CEA-specific TRS may also include 
some sequences that are not derived from the CEA genomic 
sequences shown in seq IDs *1- and 2. These other sequences 
may include sequences from adjoining regions of the CEA 
locus, such as sequences from the introns, or sequences 
35 further upstream or downstream from the sequenced DNA shown 

in Seq IDs l and 2", or they could include transcriptional 
control elements from other genes that when linked to 
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selected CE?. sequences result in the desired CEA-specific 
regulation. 

The CZA sequence of seq IDs 1 and '2 were computer 
5 analyzed for characterized consensus sequences which have 
been associated with gene regulation. Currently not enough 
is known about transcriptional regulatory sequences to 
accurately predict by sequence alone whether' a sequence 
will be functional. However, computer searches for 

10 characterized consensus sequences can help identify 

transcriptional regulatory sequences in uncharacterized 
sequences since many enhancers and promoters consist of 
unique combinations and spatial alignments of several 
characterized consensus sequences as well as other 

15 sequences- Since not all transcriptional regulatory 
sequences have been identified and not all sequences that 
are identical to characterized consensus sequences are 
functional, such a computer analysis can only suggest 
possible regions of DNA that may be functionally important 
20 for gene regulation. 

Some examples of the consensus sequences that are 
present in the CEA sequence . are shown in 

Figure 3 . Four copies of a lysozymal silencer consensus 

25 sequences have been found in the CEA sequence. Inclusion of 
one or more copies of this consensus sequence in the 
molecular chimera can help optimize CEA-specific 
expression. A cluster of topoisomerase XI cleavage 
consensus identified approximately 4-5 kb upstream of the 

30' CEA transcriptional start suggest that this region of CEA 
sequence may contain important transcriptional regulatory 
signals that may help optimize CEA-specific expression. 

The first fragment of CEA genomic sequence analyzed for 
35 transcriptional activity extends from -299 to +69, but* it 
is appreciated by those skilled in the art that other 
fragments are tested in order to isolate a TRS that directs 
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strong expression -in CEA-positive cells but little 
expression in CEA-negarive cells. As diagrammed in Figure 
4 the 943 bp Smal-Hindlll fragment of plasmid 39-5-5 was 
subcloned into the Saal-Hindlll sites of vector pBS-r 
5 (Statagene Cloning Syszams) creating plasmid 96-11. 

Single-stranded DNA was rescued from cultures of XLl-blue 
96-11 using an M13 helper virus by standard techniques. 
Oligonucleotide C R 7 0 , 5 ' - 
CCTGGAACTCAAGCTTGAATTCTCCACAGAGGAGG-3 ' (SEQ ID NO: 5), was 
10 used as a primer for oiigonucleotide-directed mutagenesis 
to introduce Hindlll ar.d EcoRI restriction sites at -t-65. 
Clone 109-3 was isolated from the mutagenesis reaction and 
was verified by restriction and DNA sequence analysis to 
contain the desired changes in the DNA sequence. CEA 
15 genomic sequences -299 to -f-69 , original numbering Figure 3, 

were isolated from 109-3 as a 381 bp EcoRX/HindllX 
fragment. Plasmid pRc/CMV (Invitrogen Corporation, San 
Diego, CA, USA) was restricted with Aatll and Hindlll and 
the 4.5 kb fragment was isolated from low melting point 
20 agarose by standard techniques. The 4.5 Jcb fragment of 
pRc/CMV was ligated to the 381 bp fragment of 109-3 using 
T4 DNA ligase. During this ligation the compatible Hindlll 
ends of the two different restriction fragments were 
ligated. Subsequently the ligation reaction was 

25 supplemented with the four deoxynudeotides , dATP, dCTP, 
dGTP, and dTTP, and T4 DNA polymerase in order to blunt the 
non-compatible Aatll and EcoRI ends. After incubating, 
phenol extracting, and ethanol precipitating the reaction, 
the DNAs were again incubated with T4 DNA ligase. The 
30 resulting plasmid, pCR92, allows the insertion of any. 

desired coding sequence into the unique Hindlll site 
downstream of the CEA TRS, upstream from a polyadenylation 
site and linked to a dominant selectable marker. The 
coding sequence for CD or other desirable effector or 
35 reporter gene, when inserted in the correct orientation 
into the Hindlll site, are transcriptionally regulated by 
the CEA sequences and are preferably expressed in cells 
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that express CEA but. net in cells that do not express CEA. 



In order to detemine the optimized CZA TRS other 
reporter gene constructs containing various fragments of 
5 CEA genomic sequences are made by standard techniques from 
DNA isolated from any of the CEA genomic clones (Figures l, 
2, 4, and 5). DNA fragments extending from the Hindlll 
site introduced at position +65 (original numbering Figure 
3A) and numerous different upstream sites are isolated and 

10 cloned into the unique Hindlll site in plasmid 
pSVOALdeltas ' (De Wet, J.R.", et al. Kol. Cell. Biol., 7, 
725-737 (1SS7) which is herein incorporated by reference in 
its entirety) or any similar reporter gene plasmid to 
construct lucif erase reporter gene constructs, Figures 4 

15 and 5. ' These and similar constructs are used in transient 
expression assays performed in several CEA-positive and 
CEA-negative cell lines to determine a strong, CEA-positive 
cell-type specific TRS. Figures 5B, 5C, and 5D show the 
results obtained from several CEA/ lucif erase reporter 

20 constructs. The optimized TRS is used to regulate the 
expression of CD or other desirable gene in a cell-type 
specific pattern in order to be able to specifically kill 
cancer cells. The desirable expression cassette is added 
to a retroviral shuttle vector to aid in delivery of the 

25 expression cassette to cancerous tissue. 

Strains containing plasmids 39-5-5 and 39-5-2 were 
deposited at the ATCC under the Budapest Treaty with 
Accession No. 68904 and 68905, respectively. A strain 
30 containing plasmid pCR92 was deposited with the ATCC under 

the Budapest Treaty with Accession No. 68914. A strain 
containing plasmid pCR145 was deposited at the ATCC under 
the Budapest Treaty with Accession No. 69460. 

35 B) Cloning and isolation of the E. coli gene encoding 

cvtosine deaminase (CDT 
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The cloning, sequencing and expression of E. coli CD 
has already been published (Austin & Huber, Molecular 
Pharmacology, aj., 380 - 337 (1993) the disclosure of which 
is incorporated herein by reference) . A positive genetic 
5 selection was designed for the cloning of the ccdA gene 
from E. coli. The selection took advantage of the fact 
that E. coli is only able to metabolize cytcsine via CD. 
Sased on this, an E. call strain was constructed that could 
only utilize cytosine as a pyrimidine source when cytosine 
10 deaminase was provided la.. trans. This strain, BA101, 
contains a deletion of the codAB operon and a mutation in 
the pyrF gene. The srrain was created by transducing a 
pyrF mutation (obtained from the E. coli strain X32 (E. 
coli Genetic Stock Center, New Haven, CT, USA) ) into the 
15 strain MBM7 007 (W. Dallas, Burroughs Wellcome Co., NC, USA) 

which carried 'a deletion of the chromosome from lac to 
argF. The pyrF mutation confers a pyrimidine requirement 
on the strain, BA101. In addition, the strain is unable to 
metabolize cytosine due to the codAB deletion. Thus, BA101 
20 is able to grow on miniaal medium supplemented with uracil 
but is unable to utilize cytosine as the sole pyrimidine 
source. 

The construction of BA101 provided a means for 
25 positive selection of DNA fragments encoding. The strain, 
BA101, was transformed with plasmids carrying inserts from 
the E. coli chromosome and the transformants were selected 
for growth on minimal medium supplemented with cytosine. 
Using this approach, the transformants were screened for 
30 the ability to metabolize cytosine indicating the presence 
of a DNA fragment encoding CD. Several sources of DNA 
could be used for the cloning of the codA gene: -1) a 
library of the E. coli chromosome could be purchased 
commercially (for example from Clontech, Palo Alto, CA, USA 
35 or Stratagene; La Jolla, CA, USA) and screened; 2) 
chromosomal DNA could be isolated from E. coli, digested 
- with various restriction enzymes and ligated and plasmid 
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DNA with compatible ends before screening; and/or 3) 
bacteriophage lambda clones containing mapped E. coli 
chromosomal DNA inserts could be screened. 

5 Bacteriophage lambda clones (Y. Xohara, National 

Institute of Genetics, Japan) containing DNA inserts 
spanning the 6-8 minute region of the E. coli chromosome 
vere screened for the ability to provide transient 
complementation of the ccdA defect. Two clones, 137 and 
10 138 were identified in '-..this manner. Large-scale 
preparations of DNA from these clones were isolated from 
500 ml cultures. Res-criczion enzymes were used to generate 
DNA fragments ranging in size from 10-12 Jcilobases. The 
enzymes used were ZccKL, ZcoRl and BamRT, and ScoRl and 
15 EindXXX. DNA fragments of the desired size were isolated 
from preparative agarose gels by electroelution. The 
isolated fragments were ligated to pBR322 (Gibco BRL, 
Gaithersburg, MD, USA) with compatible ends. The resulting 
ligation reactions were used to transform the E. coli 
20 strain, DHSa (Gibco BRL, Gaithersburg, MD, USA) . This step 
was used to amplify the recombinant plasmids resulting from 
the ligation reactions. The plasmid DNA preparations 
isolated from the ampicillin-resistant DH5a transf ormants 
were digested with the appropriate restriction enzymes to 
25 verify the presence of insert DNA. The isolated plasmid 
DNA was used to transform BA101. The transformed cells 
were selected for resistance to ampicillin and for the 
ability to metabolize cytosine. Two clones were isolated 
pEAOOl and pEA002. The plasmid pEAOOl contains an 
30 approximately 10.8 kb EcoRX-BanKI insert while pEA002 
contains an approximately 11.5 kb EcoRL-BladlXX insert. 
The isolated plasmids were used to transform BA101 to 
ensure that the ability to metabolize cytosine was the 
result of the plasmid and not due to a spontaneous 
35 chromosomal mutation. 



A physical map of the pEAOOl DNA insert was generated 
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using restriction enzymes. Deletion derivatives of pEAOOl 
were constructed based on this .restriction nap. The 
resulting plasnids were screened for the ability to allow 
BA101 to merabolize cycosine. Using this approach, the 
5 codA gene was localized to a 4.8 kb Zco&I-BglZZ fragment. 

The presence of codA within these inserts was verified by 
enzymatic assays for CD activity. In addition, cell 
extracts prepared for enzymatic assay were also examined by 
polyacrylanide gel electrophoresis. Cell extracts that 
10 were positive for enzymatic, activity also had a protein 
band migrating with an apparent molecular weighr of 52,000. 



The DNA sequence cf both strands was determined for a 
15 1634 bp fragment. The sequence determination began at the 
PstI site and extended to PvuZX site thus including the 
codA coding domain. An open reading frame of 1283 
nucleotides was identified. The thirty amino terminal 
amino acids were confirmed by protein sequencing. 
20 Additional internal amino acid sequences were generated 
from CNBr-digestion of gel-purified CD. 

A 200 bp PstI fragment was isolated that spanned the 
transnational start codon of codA. This fragment was 

25 cloned into pBS*. Single-stranded DNA was isolated from 30 
ml culture and mutanized using a custom oligonuclotide BA22 
purchased from Synth e cell Inc. , RocJcville, MD, USA and the 
oligonucleotide-directed mutagenesis kit (Amersham, 
Arlington Heights, IL, USA). The base changes result in 

30 the introduction of an Eindlll restriction enzyme site for 
joining of CD with CEA TRS and in a trans lational start 
codon of ATG rather than GTG. The resulting 90 bp frindlll- 
Pstl fragment is isolated and ligated with the remainder of 
the cytosine deaminase gene. The chimeric CEA TRS/cytosine 

35 deaminase gene is created by ligating the Eindlll-PvuIZ 
cytosine deaminase-containing DNA fragment with th CEA TRS 
sequences . 
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The strain BA101 and the plasmids, pEAOOl and pEA003, 
were deposited with ATCC under the Budapest Treaty with 
Accession lies. 55299, 63916; and 63915 respectively. 

5 £] construction of transcriptional regulator-/ s equence 

carcinoembrvonj.c ant;c 5 r.fcvtosipe deaminase nolecul^r 

chimera 

A 1503 bp Hindlll/rvull fragment containing the coding 
10 sequence for cytosine deaminase is isolated from the 

plasmid containing the full length CD gene of Example 13 
that has been altered t= contain a Hindlll restriction site 
just 5' of -the initiation codon. Plasmid pC392 contains 
CEA sequences -299 to -f-69 immediately 5' to a unique 
15 Hindlll restriction site and a polyadenylation signal 3' to 
a unique Apal restriction site (Example 1A, Figure 4). 
pCR92 is linearised with Apal, the ends are blunted using 
dNTPs and T4 DNA polymerase, and subsequently digested with 
Hindlll. The pCR92 HincIII/Apal fragment is ligated to the 
20 1508 bp Hindlll / PvuII fragment containing cytosine 
deaminase. Plasmid pCHA-l/codA, containing CD inserted in 
the appropriate orientation relative to the CEA TRS and 
polyadenylation signal is identified by restriction enzyme 
and DNA sequence analysis. 

25 

The optimized CEA-specific TRS, the coding sequence 
for CD with an ATG translation start, and a suitable 
polyadenylation signal are joined together using standard 
molecular biology techniques. The resulting plasmid, 
30 containing CD inserted in the appropriate orientation 
relative to the optimized CEA specific TRS and a 
polyadenylation signal is identified by restriction enzyme 
and DNA sequence analysis. 

35 Example ? 



Construction of a retroviral shuttle vector construct 
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ccntaininc the molecular chimera of Example 1 

The retroviral shuttle vector pL-CEA-l/codA is constructed 
by ligating a suitable restriction fragment containing the 
optimized CEA TRS/codA molecular chimera including the 
polyadenylation signal into an appropriate retroviral 
shuttle vecror, such as N2(XM5) linearised at the XhoX 
site, using standard molecular biology techniques. The 
retroviral shuttle vector pL-CEA-l/codA is characterized by 
restriction endonuclease ;• mapping and partial DNA 
sequencing. 

Example 3 

Virus Production of Retroviral Constructs of Example 3 

The retroviral shuttle construct described in Example 2 is 
placed into an appropriate packaging cell line, such as 
PA317, by electroporation or infection. Drug resistant 
colonies, such as those resistant to G413 when using 
shuttle vectors containing the NEO gene, are single cell 
cloned by the limiting dilution method, analyzed by 
Southern blots, and titred in NTH 3T3 cells to identify the 
highest producer of full-length virus. 

Example 4 

Further data on the CEA TP? 

In addition to the plasmids shown in figure SB, the 
following combinations of regions have proved particularly 
advantageous at high level expression of the reporter gene 
in the system described in Example 1A: 
PCR177 : 

(-14.5kb to -10.6kb) + (-G.lkb to -3 .Skb) + (-299b to +69b) 
PCR176: 

(-13.6kb to -10.6kb) + (-6.1kb to -3.9kb) + (-299b to +69b) 
PCR165: 

(-3.9kb to -6.1kb) + (4X -90b to +69b) 
PCR168: 

(-13.6kb to -10.6kb) + (4x -90b to +69b) . 
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SEQUENCE LISTING 

(1) GENERAL INFORMATION: 

(i) APPLICANT: 

(A) NAME: The Wellcome Foundation Limited 

(B) STREET: Unicorn House, 160 Huston Road 

(C) CITY: London 

(E) COUNTRY: C.3. 

(F) POSTAL CODE (ZIP): NW1 2BP 

(ii) TITLE OF INVENTION: Transcriptional Regulatory Sequence 
(iii) NUMBER OF SEQUENCES: S 

(iv) COMPUTER READABLE FORM: 

"(A) MEDIUM TYPE: Floppy disk 

(B) COMPUTER: I3M PC ccncatiale 

(C) OPERATING SYSTEM: PC-DOS /MS -DOS 

(D) SOFTWARE: Pacenttln Release #1.0, Version #1.2S (EPO) 

(2) INFORMATION FCr'sSQ ID NO: 1: 

(i) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 11288 base cairs 

(B) TYPE: nucleic acid 

(C) STRANDSDNSSS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
(iii) HYPOTHETICAL: NO 
(iii) ANTI-SENSE: NO 

(v) FRAGMENT TYPE: N- terminal 

(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: Is 

AAGCTTAAAA CCCAATGGAT TGACAACATC AAGAGTTGGA ACAAGTGGAC ATCGAGATGT 60 

TACTTGTGGA AATTTAGATG TGTTCAGCTA TCGGGCAGGA GAATCTGTGT CAAATTCCAG 120 

CATCGTTCAG AAGAATCAAA AAGTGTCACA GTCCAAATGT GCAACAGTGC AGGGGATAAA 180 

ACTCTGGTCC ATTCAAACTG AGGGATATTT TGGAACATGA GAAAGGAAGG CATTGCTCCT 240 

CCACAGAACA TGGATCATCT CACACATAGA GTTGAAAGAA AGGAGTCAAT CCCAGAAIAG 300 

AAAAXGATCA CTAATTCCAC CTCTATAAAG TTTCCAAGAG GAAAACCCAA TTCTGCTGCT 360 

AGAGATCAGA ATGGAGGTGA CCTGTGCCTT GCAATGGCTG TGAGGGTCAC GGGAGTGTCA 420 

CTTAGTGCAG GCAATGTGCC GTATCTTAAT CTGGGCAGGG CTTTCATGAG CACATAGGAA 480 

TCCAGACATT ACTGCTGTGT TCATTTTACT TCACCGGAAA AGAAGAATAA AATCAGCCGG S40 

GCGCGGTGGC TCACGCCTGT AATCCCAGCA CTTTAGAAGG CTGAGGTGGG CAGATTACTT 6Q0 

CACGTCAGGA GTTCAAGACC ACCCTGGCCA ATATGGTGAA ACCCCGGCTC TACTAAAAAT 660 

ACAAAAATTA GCTGGGCATG GTGGTGCGCG CCTGTAATCC CAGCTACTCG GGAGGCTGAG 720 
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GCTGGACAAT 


XGCTTGGACC CAGGAAGGAG 


AG3TTGCAG? 


GAGCCAAGAX 


TGTGCCACSS 


780 


CACTCCAGCT 


TGGGCAACAG ACCCAGACTC rS7AAAAAAA 


AAAAAAAAAA 


AAAAAAAAAG 


840 


AAAGAAAGAA 


AAAGAAAAGA AAGTATAAAA 


rcrCTTTGGG 


TTAACAAAAA 


AAGATCCACA 


900 


AAACAAACAC 


CAGCTCTTA? CAAACTTACA CAACTCTGCC 


AGAGAACAGG 


AAACACAAAI 


960 


ACTCArTAAC 


TCACTTTTG? GGCAATAAAA 


ccrrcATGTC 


AAAAGGAGAC 


CAGGACACAA 


1020 


TGAGGAAGTA 


AAACTGCAGC CCCTACTTGG 


57GCAGAGAG 


GGAAAATCCA 


CAAATAAAAC 


1080 


ATTACCAGAA 


GGAGCTAAGA TTTACTGCAT 


rGAGTTCATT 


CCCCAGGTAT 


GCAAGG7GAT 


1140 


TTTAACACCT 




ACCACATAGA 


CAG&STAGCT 


AGAAAAAAAT 


1200 


TACAACTAGC 


AGAACAGAAG CAA7TTGCCC 


rrCCTAAAAT 


TCCACATCAT 


ATCATCATGA 


12 SO 


TGGAGACAGT 


GCAGACSCCA ATGACAATAA AAAGAGGGAC 


CTCCS7CACC 


CGGTAAACAT 


1320 


GTCCACACAG 


CTCCAGCAAG CACCCGTCTT 


CCCAG7GAAT 


CACTCTAACC 


TCCCC7T2AA 


1380 


TCAGCCCCAG 


GCAAGGC7GC CTCCGATGGC 


CACACAGGCT 


CCAACGCGTG 


GGCC7CAACC 


1440 


TCCCGCAGAG 


GCTcrccrrr ggccacccca 


rGGGGAGAGC 


ACGAGGACAG 


GGCAGAGCCC 


1500 


TCTGATGCCC 


ACACATGGCA GCAGCTGACG 


CCAGAGCCAT 


GGGGGCTGGA 


GAGCAGAGCT 


IS 60 


GCTGGGGTCA 


GAGCTTCCTG AGGACACCCA 


GGCCTAAGGG 


AAGGCAGCTC 


CCTGGATCGG 


1620 


GGCAACCAGG 


CTCCGGGCTC CAACCTCAGA 


CCCCGCATGG 


GAGGAGCCAG 


CACTCTAGGC 


1680 


CTTTCCTAGG 


GTGACTCTGA GGGGACCCTG 


ACACGACAGG 


ATCSCTGAAT 


GCACCCSAGA 


1740 


TGAAGGGGCC 


ACCACGGGAC CCTGCTCTCG 


CGGCAGATCA 


GGAGAGACTG 


GGACACCAXG 


1800 


CCAGGCCCCC 


ATGCCAXGGC TGCGACTGAC 


CCAGGCCACT 


CCCCTGCATG 


CATCAGCCTC 


1860 


GGTAAGTCAC 


ATGACCAAGC CCAGGACCAA 


iGTGGAAGGA 


AGGAAACAGC 


ATCCCCTTTA 


1920 


GTGATGGAAC 


CCAAGGTCAG TGCAAAGAGA 


GCCCATGAGC 


AGTTAGGAAG 


GGTGGTCCAA 


1980 


CCTACAGCAC 


AAACCAXCAT CTATCATAAG 


TAGAAGCCCT 


GCTCCATGAC 


CCCTGCATTT 


2040 


AAAIAAACGT 


TTGTTAAAIG AGTCAAATTC 


CCTCACCATG 


AGAGCXCACC 


TGTGTGTAGG 


2100 


CCCATCACAC 


ACACAAACAC ACACACACAC 


ACACACACAC 


ACACACACAC 


ACACAGGGAA 


2160 


AGTGCAGGAT 


CCTGGACAGC ACCAGGCAGG 


CITCACAGGC 


AGAGCAAACA 


GCGTGAATGA 


2220 


CCCATGCAGT 


GCCCTGGGCC CCATCAGCTC 


AGAGACCCTG 


TGAGGGCTGA 


GATGGGGCXA 


2280 


GGCAGGGGAG 


AGACTTAGAG AGGGTGGGGC 


CTCCAGGGAG 


GGGGCTGCAG 


GGAGCTGCGT 


2340 


ACTGCCCTCC 


AGGGAGGGGG CTGCAGGGAG 


CTGGGTACTG 


CCCTCCAGGG 


AGGGGGCTGC 


2400 


AGGGAGCTCG 


GTACTGCCCT CCAGGGAGGG 


GGCTGCAGGG 


AGCTGGGTAC 


TGCCCTCCA6 


2460 


GCAGGGGGCT 


GCAGGGAGCT GGGTACTGCC 


CTCCAGGGAG 


GCAGGAGCAC 


TGTTCCCAAC 


2S20 


AGAGAGCACA 


TCTTCCTGCA GCAGCTGCAC 


AGACACAGGA 


GCCCCCATGA 


CTGCCCtGGG 


2580 


CCAGGGTGTG 


GATTCCAAAT TTCGTGCCCC 


ATTGGGTGGG 


ACGGAGGTTG 


ACCGTGACAT 


2640 


CCAAGGGGCA 


TCTGTGATTC CAAACTTAAA 


CTACTGTGCC 


TACAAAATAG 


GAAATAACCC 


2700 


TACTTTTTCT 


ACTATCTCAA ATTCCCTAAG 


CACAAGCTAG 


CACCCTTTAA 


ATCAGGAAGT 


2760 
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TCAGTCACTC CTGGGGXCCT CCCAXGCGeC CAGXCXGACT XGCAGGTGCA eAGGGXGGCX 2820 
GACAXCXGXC CXXGCXCCXC CXCXXGGCXC AACXGCCGCC CCTCCXGGGG GXGACXGAXG 2380 
GXCAGGACAA GGGAXCCT?.G AGCXGGCCCC ATGAXTGACA GGAACGCAGG ACXXGGCCXC 2940 
CAXXCXGAAG ACTAGGGG7G XCAAGAGAGC rGGGCAICCC ACAGAGCXGC ACAAGA2GAC 3000 
GCGGACAGAG GGTGACACAG GGCXCAGGGC XXCACACGGG XCGGGAGGCX CAGCTGAGAG 3060 
TTCAGGGACA GACCTGAGGA GCCXCAGTGG GAAAAGAAGC ACXGAAGXGG GAAG7CCTGG 3120 
AAXGXXCXGG ACAAGCCXGA GTGCTCTAAG GAAATGCTCC CACCCCGAXG XAGCCXGCAG 3180 
CACTGGACGG XCXGXGXACC XCCCCGCXGC CCAXCCXCXC ACAGCCCCCG C CTCTAGGGA 3240 
CACAACXCCX GCCCXAACAX GCAICXXTCC TGTCXCAXXC CACACAAAAG GGCCXCXGGG 3300 
GICCCXGXXC XGCA7XGCAA GGAGTGGAGG XCACGXXCCC ACAGACCACC CAGCAACAGG 3360 
GTCCXAXGGA GGTGCGGTCA GGAGGATCAC ACGXCCCCCC AXGCCCAGGG GACXGACXCX 3420 
GGGGGXGAXC GAXXGGCCXG GAGGCCACXG GXCCCCXCTG XCCCXGAGGG GAAXCXGCAC 3480 
CCXGGAGGCX GCCACAXCCC XCCXGAXXCT XXCAGCXGAG GGCCCXXCXX GAAAXCCCAG 3540 
GCAGGACXCA ACCCCCACXG GGAAAGGCCC AG7GXGGACG GXXCCACAGC AGCCCAGCTA 3600 
ACGCCCXXGG ACACAGAXCC TGAGTGAGAG AACCTXTAGG GACACAGGTG CACGGCCAXG 3660 
XCCCCAGXGC CCACACAGAG CAGGGGCATC XGGACCCXGA GXGXGXAGCX CCCGCGACTG 3720 

AACCCAGCCC XTCCCCAAXG ACGIGACCCC XGGGGXGGCX CCAGGXCXCC AGXCCAIGCC 3780 

ACCAAAATCT CCAGAT7GAG GGXCCXCCCX XGAGXCCCXG ATGCCTGTCC AGGAGCXGCC 3840 

CCCTGAGCAA ATCTAGAGTG CAGAGGGCTG GGATTGTCGC AGTAAAAGCA GCCACAXXXG 3900 

TCTCAGGAAG CAAAGGGAGG ACAXG AG CTC CAGGAAGGGC GAXGGCGICC XCXAGXGGGC 3960 

GCCXCCXGXX AAXGAGCAAA AAGGGGCCAG GAGAGXXGAG AGAXCAGGGC XGGCCXXGGA 4020 

CXAAGGCXCA GAXGGAGAGG ACXGAGGXGC AAAGAGGGGG CXGAAGXAGG GGAGXGGXCG 4080 

GGAGAGAIGG GAGGAGCAGG XAAGGGGAAG CCCGAGGGAG GCCGCGGGAG GGXACAGCAG 4140 

AGCXCXCCAC TCCXCAGCAI XCACAXTXGG GGIGGXCGXG CXAGXGGGGX XCXGXAAGXX 4200 

GTAGGGXGXT CAGCACCAIC IGGGGACICX ACCCACTAAA XGCCAGCAGG ACTC CC X CC C 4260 

CAAGCXCXAA CAACCAACAA XGXCXCCAGA CXXXCCAAAI GXCCCCXGGA CAGCAAAAIT 4320 

GCXXCXGGCA GAAXCACIGA XCXACGTCAG XCXCXAAAAG IGACXCAXCA GCGAAAICCX 4380 

ICACCXCXXC CGAGAAGAAI CACAAGXGXG AGAGGGGXAG AAACXGCAGA CTXCAAAATC 4440 

XXXCCAAAAG AGXXXXACXX AAICAGCAGX XXGAXGXCCC AGGAGAAGAI ACAXXXAGAG 4500 

XGXXXAGAGI TGAXGCCACA IGGCXCCCTG XACCXCACAG CAGGAGCAGA GXGGGXXXIC 4560 

CAAGCGCCXG XAACCACAAC XGGAAIGACA CXCACXGGGX XACAXXACAA AGXGGAAXGX 4620 

GGGGAAXXCT GXAGACXXXG GGAAGGGAAA XGXAXGACGI CAGCCCACAG CCXAAGGCAG 4680 

XGGACAGXCC ACTXXGAGGC XCXCACCAXC IAGGAGACAX CXCAGCCAXG AACAXAGCCA 4740 

CAICXGXCAT XAGAAAACAX GXXXXATXAA GAGGAAAAAX CXAGCCXAGA AGXCCXXXAX 4800 
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GcrcTTrrrr CTcrr «*» G? xcaaaxtcax .*.r.-.crrrTAG atcattcctt aaagaagaat 4860 

CTATCCCCCT AAGTAAAXGT TATCACTGAC rSGATAGTGT TGGTGTCTCA CTCCCAACCC 4920 

CTGTG7GGTG ACAGTGCCC? GCTTCCCCAG CCCrGGGCCC 7CTCTGATTC CTGAGAGCTT 4980 

TGGGTGCTCC TTCATTAGGA GGAAGAGAGG Ar.GGGTGTTT TTAATATTCT CACCAXTCAC S040 

CCATCCACCT CTTAGACACT GGGAAGAATC AGXTGCCCAC TCrTGGATTT GATCCTCGAA S100 

TTAATGACCT CTATTTCCGT CCCTTGTCCA TTTCAACAAT GTGACAGGCC TAACAGG7GC 5160 

CTTCTCCATG TGATTTTTGn. GGAGAAGGTC CTCAAGATAA GTTTTCTCAC ACCTCTTTGA 5220 

ATTACCTCCA CCTGTGTCCC CATCACCATT ACCAGCAGCA TTTGGACCCT TZZZCZZTZK 5280 

GTCAGAXGCT TTCCACC7CT TGAGGGTGTA 7ACTG7ATGC TCTCTACACA GGAACATGCA 5340 

GAGGAAATAG AAAAAGGGAA ATCGCATTAC ~A~CAGAGA GAAGAAGACC TTTAwtTGAA 5400 

TGAA-GAGAG TCTAAAATCC 7AAGAGAGCC CArATAAAAT TATTACCAGT GCTAAAACTA 5460 

CAAAAG7TAC ACTAACAGSA AACTAGAATA ACAAAACATG CATCACAGTT GCTGGTAAAG 5520 

CTAAATCAGA TATTTTT7-C TTAGAAAAAG CATTCCATGT GTGTrGCAGT GATGACAGGA =580 

GTGCCCTTCA GTCAATAaGC TGCCTGTAAT rCTTGTTCCC XGGCAGAATG TATTGTCTTT 5640 

TCTCCCTTTA AAT CT7AAAT GCAAAACTAA AGGCAGCTCC TGGGCCCCCT CCCCAAAGTC 5700 

AGCTGCCTGC AACCAGCCCC ACGAAGAGCA GAGGCCTGAG CTTCCCTGGT CAAAAXAGGG 5760 

GGCTAGGGAG CTTAACCTTG CTCGATAAAG CrGTGTTCCC AGAATGTCGC TCCTGTTCCC 5820 

AGGGGCACCA GCCTGGAGGG TGGTGAGCCT CACTGGTGGC CTGAXGCTTA CCTTGrCCCC S880 

TCACACCAGT GGTCACTGGA ACCTTGAACA CTTGGCTGTC GCCCGGATCT GCAGA7GTCA 5940 

AGAACTTCTG GAAGTCAAA.T TACTGCCCAC TTCTCCAGGG CAGATACCTG TGAACATCCA 6000 

AAACCAIGCC ACAGAACCCT GCCTGGGGTC TACAACACAT ATGGACTGTG AGCACCAAGT 6060 

CCAGCCCTCA ATCTGTGACC ACCTGCCAAG ATGCCCCTAA CTGGGATCCA CCAATCACTG 6120 

CACATGGCAG GCAGCGAGGC TTGGAGGTGC TTCGCCACAA GGCAGCCCCA ATTTGCTGGG 6180 

AGTTTCTTGG CACCTGGTAG TGGTGAGGAG CCTTGGGACC CTCAGGATTA CTCCCCTTAA 6240 

GCATAGTGCG GACCCTTCTG CATCCCCAGC AGGTGCCCCG CTCTTCAGAG CCTCTCTCTC 6300 

XCAGGTTXAC CCAGACCCCT GCACCAATGA CACCATGCTG AAGCCTCAGA GAGAGAGATG 6360 

GAGCTTXCAC CAGCACCCCC TCTTCCTTGA GGGCCAGGGC ACGGAAAGCA GGAGGCAGCA 6420 

CCAGGAGTGG GAACACCAGT GTCTAAGCCC CTGAXGAGAA CAGGGTGGTC TCTCCCAXAT 6480 

GCCCATACCA GGCCTGTGAA CAGAATCCTC CTXCTGCAGT GACAATGTCT GAGAGGACGA 6540 

CATGTTTCCC AGCCTAACGT GCAGCCATCC CCATCTACCC ACTGCCTACT GCAGGACAGC 6600 

ACCAACCCAG GAGCTGGGAA GCTGGGAGAA GACATGGAAT ACCCATGGCT TCTCACCTTC 66J50 

CTCCAGXCCA GTGGGCACCA TTTATGCCTA CCACACCCAC CTGCCGGCCC CAGGC TC T TA 6720 

ACAGTTAGCT CACCTACGTG CCTCTGGGAG GCCGAGGCAG GAGAATTGCT TGAACCCGGG 6780 

AGGCAGAGGT TGCAGTCAGC CG AG AT CACA CCACTGCACT CCAGCCTGGG TGACAGAAXG 6840 
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AGACXCTSXC XCAAAAAAAA AGAGAAAGA? i.GCATCAGTG GCXAGCAAGG GCTASGSGCA 6900 

GGGGAAGGTG GAGAGXTAAT G AXXAAXAGX AX5AAGXXXC XAIGXGAGAX GATGAAAATG 6960 

XXCXGGAAAA AAAAAXATAG TGG7GAGGAT G7AGAAIAXX G7GAAXAXAA XXAACGGCAT 7020 

XXAAXTSTAC ACXXAACA7G A77AAXGXGG CA7AXXXXAX C77AXGXAXX XGACXACAXC 7080 

CAAGAAACAC TGGGAGAGGG AAAGCCCACC A7G7AAAAXA CACCCACCCT AAICAGATAG 7140 

ICCTCA7TGX ACCCAGG7AC AGGCCCCXCA 7SACC7GCAC AGGAAXAACX AAGGA7XIAA 7200 

GGACA7GAGG CTTCCCAGCC AAC7GCAGG7 3CACAACATA AA7G7AXCXG CAAACAGnCT 7260 

GAGAG7AAAG CTGGGGGCAC AAACCXCAGC AC7GCCAGGA CACACACCCT XCXCG7GGAI 7320 

ICXGAC7X7A TCTGACCCGG CCCAC7GXCC AGA7CTXGXX G7GGGAXXGG GACAAGGGAG 7380 

GXCAXAAAGC CXGXCCCCAG GGCACXCXG7 GTSAGCACAC GAGACC7CCC CACCCCCCCA 7440 

CCGXXAGGrC XCCACACA7A GA7C7GACCA 77AGGCAXXG TGAGGAGGAC TCTAGCGCGG 7S00 

GCXCAGGGAI CACACCAGAG AA7CAGGXAC AGAGAGGAAG ACGGGGCTCG AGGAGCTGAX 7560 

GGA7GACACA GAGCAGGG7X CC7GCAGXCC ACAGGTCCAG C7CACCCXGG XGIAGG7GCC 7620 

CCAXCCCCC7 GATCCAGGCA 7CCCTGACAC AGC7CCCXCC CGGAGCCXCC XCCCAGG7GA 7680 

CACA7CAGGG TCCCTCACTC AAGCTGTCCA SAGAGGGCAG CACCTXGGAC AGCGCCCACC 7740 

CCAC77CAC7 CXXCCXCCC7 CACAGGGCTC AGGGCTCAGG GC7CAAGXCX CAGAACAAAT 7800 

GG CAGAGGCC AGTGAGCCCA GAGATGGTGA CAGGGCAATG ATCCAGGGGC AGCXGCCTGA 7860 

AACGGGAGCA GGTGAAGCCA CAGAIGGGAG AAGATGGTTC AGGAAGAAAA AXCCAGGA&X 7920 

GGGCAGGAGA GGAGAGGAGG ACACAGGCTC XG7GGGCCXG CAGCCCAGGA TGGGACTAAG 7980 

XGXGAAGACA TCTCAGCAGG TGAGGCCAGG 7CCCAICAAC AGAGAAGCAG CTCCCACCTC 8040 

CCCXGA7CCA CGCACACACA GAGXCXGXGC TCC7GIGCCC CCACAGTCGG CCTCTCCTGT 8100 

XCXGG7CCCC AGCGAGTGAG AAGTGAGGTT GACTXGTCCC TGCTCCTCTC TGCTACCCCA 8160 

ACAXXCACCI XCXCCXCAXG CCCCTCTCTC 7CAAAIAXGA TTTGGATCTA TGTCCCCGCC 8220 

CAAAXCXCAX GXCAAAXXGX AAACCCCAAT GXXGGAGGXG GCGCCTTGTG AGAAG76AXX 8280 

GGATAATCCG GGTGGATTXT CTGCTTTGAT G C7GXXXC XG 7GA7AGAGAX CXCACAIGAX 8340 

CXGGXXGXXX AAAAGTCTCT AGCACCTCTC CCCTCTCICT CTCTCTCTCT TACXCAXGCX 8400 

CTCCCAXGTA AGACGTTCCT GTTTCCCCTT CACCGTCCAG AATGATTGTA AGTTTTCTGA 8460 

GGCCTCCCCA GGAGCAGAAG CCACIAXGCX ICCTGTACAA CTGCAGAATG AXGAGCGAAX 8S20 

XAAACC7CXI TTCTTXATAA ATTACCCAGT C7CAGGXAXX TCTTTATAGC AATGCGAGGA 8S80 

CAGACXAAIA CAATCTTCTA CTCCCAGAXC CCCGCACACG CXXAGCCCCA GACAICACXG 8640 

CCCCXCGGAG CAXGCACAGC GCAGCCXCCX CCCGACAAAA GCAAACXCAC AAAAGGXGAC 8700 

AAAAAXCXCC AXXXGGGGAC AXCXGAXIGX GAAAGAGGGA GGACAGXACA CXXG7AGCCA 8760 

CAGAGACXGG GGCXCACCGA GCXGAAACCX GC7AGCACXX XGGCAXAACA XGXGCAICAC 8820 

CCCXGXXCAA XGXCXAGAGA XCAGXGXXGA G7AAAACAGC CXGGXCXGGG GCCGCrCCXG 8880 
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TCCCCACTTC CCTCCTGTCC ACCAGAGGGC GGCAGAGTTC CTCCCACCCT GGAGCCSCGC 8940 

CAGGGGCTGC TGACCTCGC? CAGCCGGGCC CACAGCCCAG CAGCG7CCAC CCTCACCCGG 9000' 

GTCACCTCGG CCCACGZCCT CCTCGCCCTC CGAGCTCCTC ACACCGACTC TGTCAGCTCC 9060 

TCCCTGCAGC CTATCGGCCG CCCACCTGAG GCTTGTCGGC CGCCCACTTG AGGCC737CG 9120 

GCTGCCCTCT G C AGGCAG CT CCTGrCCCCT ACACCCCCTC CTTCCCCGGG CTCAGCTGAA 9180 

AGGGCG7C7C CCAGGGCAGC 7CCCTGTGA7 CTCCAGGACA GCTCAGTCTC TCACAGGCTC 9240 

CGACGCCCCC TATGCCGTCA CCrCACAGCC CCGTCATTAC CATTAACTCC TCAGCCCCAT 9300 

GAAGTTCACT GAGCGCGTGT CTCCCGGTTA CAGGAAAACT CTGrGACAGG GACCAC3TCT 9360 

GTCCTGCrCT CTGTGGAArC CCAGGGCCCA GCCCAGTGCC TGACACGGAA CAGA73C7CC 9420 

ATAAATACTG GTTAAA~GTS TGGGAGATCT CTAAAAAGAA GCATATCACC TCCGrGTGSC 9480 

CCCCAGCAGT CAGAG7CT3T TCCA?GTGGA CACAGGGGCA CTGGCACCAG CATGGGAGGA 9S40 

GGCCAGCAAG TGCCCGCGGC TGCCCCAGGA ATGAGGCCTC AACCCCCAGA GCTTCAGAAG 9600 

GGAGGACAGA GGCCTGCAGG GAATAGATCC TCC3CCCTGA CCCCGCAGCC TAATCCAGAG 9660 

TTCAGGGTCA GCTCACACCA CGTCGACCCT GGTCAGCATC CCTAGGGCAG TTCCAGACAA 9720 

GGCCGGAGGT CTCCTCTTGC CCTCCAGGGG GTGACATTGC ACACAGACAT CACTCAGGAA 9780 

ACGGATTCCC CTGGACAGGA ACCTGGCTTT GCTAAGGAAG TGGAGG7GGA CCCTGGTtTC 9840 

CATCCCTTGC TCCAACAGAC CCTTCTGATC TCTCCCACAT ACCTGCTCTG TTCCTTTCTG 9900 

GGTCCTATGA GGACCCTGTT CTGCCAGGGG TCCCTGTGCA ACTCCAGACT CCCTCCTGGT 9960 

ACCACCATGG GGAAGGTGGG GTCATCACAG GACAGTCAGC CTCGCAGAGA CAGAGACCAC 10020 

CCAGGACTGT CAGGGAGAAC ATGGACAGGC CCTGAGCCGC AGCTCAGCCA ACAGACACGG 10080 

AGAGGGAGGG TCCCCCTGGA GCCTTCCCCA AGGACAGCAG AGCCGAGAGT CACCCACCTC 10140 

CCTCCACCAC AGTCCTCTCT TTCCAGCACA CACAAGACAC CTCCCCCTCC ACATGCAGGA 10200 

TCTGGGGACT CCTGAGACCT CTGGGCCTGG G7CTCCATCC CTGGGTCAGT GGCGCGG7TG 10260 

GTGGTACTGG AGACAGAGGG CTGGTCCCTC CCCAGCCACC ACCCAGTGAG CCTTTtTCTA 10320 

GCCCCCACAG CCACCTCTGT CACCTTCCTG TTGGGCATCA TCCCACCTTC CCAGAGCCCT 10380 

GGAGAGCATG GGGAGACCCG GGACCCTGCT CGGTTTCTCT CTCACAAAGG AAAATAATCC 10440 

CCCTGGTGTG ACAGACCCAA GGACAGAACA CAGCAGAGGX CAGCACTGGG GAAGAGAGGT 10S00 

TGTCCTCCCA GGGGATGGGG GTCCATCCAC CTTGCCGAAA AGATTTGTCT GAGGAACTGA 10560 

AAATAGAAGG GAAAAAAGAG GAGGGACAAA AGAGGCAGAA ATGAGAGGGG AGGGGACAGA 10620 

GGACACCTGA ATAAAGACCA CACCCATGAC CCACGTGATG CTGAGAAGTA CTCCTGCCCT 10680 

AGCAAGAGAC TCAGGGCAGA GGGAGGAAGG ACAGCAGACC AGACAGTCAC AGCAGCCTTG 10140 

ACAAAACGTT CCTGGAACTC AAGCTCTTCT CCACAGAGGA GGACAGAGCA CACAGCAGAG 10800 

ACCATGGAGT CTCCCTCCGC CCCTCCCCAC AGATGCTGCA TCCCCTGGCA CACGCTCCTG 10860 

CTCACAGGTG AAGGGAGGAC AACCTGGGAG AGGGTGGGAG GAGGCAGCTC GGGTCT CC TG 10920 
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GGTAGGACAG GGCTGTGAGA CGGACAGAGG SGTCCXGTTC GAGCCTGAAT AGGGAAGAGG 10980 

ACATCAGAGA GGGACAGGAG TCACACCAGA AAAArCAAAT TGAACTGGAA TTGCAAAGGC' 11040 

GCAGGAAAAC CTCAAGAGTT CTATTTTCCT AGTTAATTGT CACTGGCCAC TACGTTTTTA 11100 

AAAATCATAA TAACTGCATC AGATGACACT TTAAATAAAA ACATAACCAG GGCATGAAAC 11160 

ACTGTCCTCA TCCGCCTACC GCGGACATTG GAAAATAAGC CCCAGGCTGT GGAGG3CCCT 11220 

GGGAACCCTC ATGAACTCAT CCACAGGAAT CTGCAGCCTG TCCCAGGCAC TGGGGTGCAA 11280 

CCAAGATC 11288 

(2) INFORMATION FC3 SEQ ID NO: 2: 

(i) SEQUENCE CHARACTERISTICS: 
! " ft' (A) LENGTH: 3774 base pairs 

(B) TYRE: nucleic acid 

(C) STRANDSDNESS: sincle 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPS: DMA (gencaic) 
(iii) HYPOTHETICAL: NO 
(iii) ANTI-SENSE: NO 

(v) FRAGMENT TYPE: N-terrair.ai 



<xi) SEQUENCE DESCRIPTION: SEQ ID NO: 2: 

AAGCTTTTTA GTGCTTTAGA CAGTGAGCTG GTCTGTCTAA CCCAAGTGAC CTGGGCTCCA 60 

TACTCAGCCC CAGAAGTGAA GGGTGAAGCT GGGTGGAGCC AAACCAGGCA AGCCTACCCT 120 

CAGGGCTCCC AGTGGCCTGA GAACCATTGG ACCCAGGACC CATTACTTCT AGGGTAAGGA 180 

AGGTACAAAC ACCAGATCCA ACCATGGTCT GGGGGGACAG CTGTCAAATG CCTAAA&ASA 240 

TACCTGGCAG AGGAGCAGGC AAACTATCAC TGCCCCAGGT TCTCTGAACA GAAACAGAGG 300 

CGCAACCCAA AGTCCAAAIC CAGGTGAGCA CGTCCACCAA ATGCCCAGAG ATATGACGAG 360 

GCAAGAAGTG AAGGAACCAC CCCTGCATCA AATGTTTTCC ATGGGAAGGA GAAGGGGGTT 420 

CCTCATGTTC CCAATCCAGG AGAATGCATT TGGGATCTGC CTT C TTCTCA CTCCTTGGTT 480 

AGCAAGACTA AGCAACCAGG ACTCTGGATT TGGGGAAAGA CGTTTATTTG TGGAGGCCAG 540 

TGATGACAAT CCCACCAGGG CCTAGGTGAA GAGGGCAGGA AGGCTCGAGA CACTGCGGAC 600 

TGAGTGAAAA CCACACCCAT GATCTGCACC ACCCATGGAT GCT CC TTCAT TGCTCACCTT 660 

TCTGTTGATA TCAGATGGCC CCATTTTCTG TACCTTCACA GAAGGACACA GGCTAGGGTC 720 

TGTGCAIGGC CTTCATCCCC GGGGCCATGT GAGGACAGCA GGTGGGAAAG ATCATGGGTC 780 

CTCCTGGGTC CTGCAGGGCC AGAACATTCA TCACCCATAC TGACCTCCTA GATGGGAATG 8*40 

GCTTCCCTGG GGCTGGGCCA ACGGGGCCTG CGCAGGGGAG AAAGGACGTC AGGGGACAGG 900 

GAGGAAGGGT CATCGACACC CAGCCTGGAA GGTTCTTGTC TCTGACCATC CAGGATTTAC 960 

TTCCCTGCAT CTACCTTTCG TCATTTTCCC TCAGCAATGA CCAGCTCTGC TTCCTGATCT 1020 
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CAGCC7CCCA CCCTGGACAC AGCACCCCAG rCCCrGGCCC GCCTGCATCC ACCCAATACC 1080 
CTGATAACCC AGGACCCAXT ACTTCTAGGG rAAGGAGGGT CCAGGAGACA GAAGCTGAGG 1140 
AAAGGXCTGA AGAAGTCACA TCTGTCCTGG CCAGAGGGGA AAAACCATCA GATGCTGAAC 1200 
CAGGAGAA7G TTGACCCAGG AAAGGGACCG AGGACCCAAG AAAGGAGTCA GACCACCAGG 1260 
GTTTCCC7GA GAGGAAGGAX CAAGGCCCCG AGGGAAAGCA GGGCTGGCTG CAIGTGCAGG 1320 
ACACTCGTGG GGCATAXGrG TCTTAGATTC rCGCTGAATT GAGTSTCCCT GCCATGGCCA 1380 
GACTCTCXAC TCAGGCCTGG ACATGCTGAA A7AGGACAAT GGCCTTGTCC TC7CTCCCCA 1440 

CCATTrGGCA AGAGACAtAA AGGACATTCC AGGACATGCC TTCCTGGGAG GTCCAGGTTC IS 00 

TCTGTCTCAC ACCTCAGGGA CTGTAGTTAC rSCAXCAGCC ATGGTAGGTG CTGAXCTCAC IS 60 

CCAGCCTGrC CAGGCCCrtC CACTCTCCAC XTTGTGACCA TGTCCAGGAC CACCCCTCAG 1620 

ATCCTGAGCC TGCAAAIACC CCCTTGCTGG G73GGTGGAT TCAGTAAACA GTGAGCTCCT 1680 

ATCCAGCCCC CAGAGCCACC TCTGTCACCT rCGTGCTGGG CAXCAXCCCA CCTXCACAAG 1740 

CACTAAAGAG CATGGGGAGA CCTGGCTAGC TGCG7TTCTG CA-CACAAAG AAAATAATCC 1800 

CCCAGGXTCG GATTCCCAGG GCTCTGTATG XGGAGCTGAC AGACCTGAGG CCAGGAGATA 1860 

GCAGAGGTCA GCCCTAGGGA GGCTGGGTCA "CCACCCAGG GGACAGGGGX GCACCAGCCT 1920 

XGCTACTGAA AGGGCCXCCC CACGACAGCG CCAXCAGCCC TGCCTCAGAG CTTTGCTAAA 1980 

CAGCAGTCAG AGGAGGCCAT GGCAGTGGCT GAGCTCCTGC TCCAGGCCCC AACAGACCAG 2040 

ACCAACAGCA CAAXGCAGTC CTTCCCCAAC GTCACACGXC ACCAAAGGCA AACTGAGGTG 2100 

CTACCTAACC TTAGAGCCAT CAGGGGAGAT A^CAGCCCAA TTTCCCAAAC AGGCCAGTTT 2160 

CAATCCCAXG ACAAXGACCT CTCTGCTCXC ATTCTtCCCA AAATAGGACG CTGAXTCTCC 2220 

CCCACCAXCG ATTTCTCCCT XGTCCCGCGA GCCTTTTCTG CCCCCTATGA TCTGGGCACT 2280 

CCTGACACAC ACCTCCTCTC TGGTGACATA TCAGGGTCCC TCACTGTCAA GCAGTCCAGA 2340 

AAGGACAGAA CCTTGGACAG CGCCCATCTC AGCTTCACCC TTCCTCCTTC ACAGGGTTCA 2400 

GCGCAAAGAA TAAATGGCAG AGGCCAGTGA GCCCAGAGAT GGTGACAGGC AGTGACCCAG 2460 

CGCCAGATGC CTGGAGCAGG AGCTGGCGGG GCCACAGGGA GAAGGT6ATC CACGAAGGGA 2S20 

AACCCAGAAA TGGGCAGGAA AGGAGGACAC AGGCTCTGTG GGGCTGCAGC CCAGGGXTCG 2580 

ACTAIGAGTG TGAAGCCATC TCAGCAAGTA AGGCCAGGTC CCATGAACAA GACTGGGAGC 2640 

ACGTGGCXTC CTGCTCTGXA TATGCGCTGG GGGATTCCAT GCCCCATAGA ACCAGATGGC 2700 

CGGGGTTCAG ATGGAGAAGG AGCAGGACAG GGGATCCCCA GGATAGGAGG ACCCCAGTGT 2760 

CCCCACCCAG GCAGGTCACT GATGAATGGG CATGCAGGGT CCTCCTGCGC TGGGCTCTCC 2820 

CXTTGTCCCT CAGGATTCCT TGAAGGAACA TCCGGAAGCC GACCACATCT ACCTGCTGGG 2880 

TTCTGCCCAG TCCATGTAAA CCCAGGAGCT 7GTGTTGCTA GGACGGGTCA TGGCATGTCC 294**0 ' 

TGCGGGCACC AAAGAGAGAA ACCTGAGCGC AGGCAGGACC TGG7C7GAGG AGGCATGGGA 3000 

GCCCAGATGC GGAGATGCAT GTCAGGAAAG GCTGCCCCAT CAGGGAGCGT GATAGCAATG 3060 



WO 95/14100 



PCT/GB94/02546 



39 

GCGGGTCTGT GGGAGTCSGC ACGTGGGATT CCCTGGGCTC TGCCAAGTTC CCTCCCATAG 3120 

TCACAACC7G GGGACAC73C CCATGAAGGG GCGCCTTTGC CCAGCCAGRT GCTGCTGGTT 3180 

CTGCCCA7CC ACTACCC7C7 CTGCTCCAGC CACTCTGGGT CTTTCTCCAG ATGCCCTGGA 3240 

CAGCCCTGGC CTGGGCC737 CCCCTGAGAG G7GTTGGGAG AAGCTGAGTC TCTGGGGACA 3300 

CTCTCATCAG AGTCTSAAAG GCACATCAGG AAACATCCCT GGTCTCCAGG ACTAGGCAAT 3360 

GAGGAAAGGG CCCCAGC7CC TCCCTTTGCC ACTGAGAGGG TCGACCCTGG GTGGCCACAG 3420 

TGACTTCTGC GTCTG7CCCA GTCACCCTGA AACCACAACA AAACCCCAGC CCCAGACCCT 34S0 

GCAGG7ACAA TACATGTSC-G GACAGTCTGT ACCCAGCGGA AGCCAGTTCT CTCTTCCTAG 3540 

GAGACCGGGC CTCAGGCCTS TGCCCGCGGC AGGCGGGCCC AGCACGTGCC TGTCCTTGAG 3600 

AACTCGGGAC CTTAAGCGTC TCTGCTCTGT GAGGCACAGC AAGGATCCTT CTGTCCAGAG 3660 

ATCAAAGCAG CTCCTCCCCC TCCTCTGACC TCTTCCTCCT TCCCAAAICT CAACCAACAA 3720 

ATAGCTGTTT CAAATC7CA7 CATCAAATCT TCATCCATCC ACATGAGAAA GCTT 3774 
(2) INFORMATION 7CR SZQ ID NO: 3: 

(i) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 40 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDZDNESS : single 

(D) TOPCLCGY: linear 

(ii) MOLECULE TT?E: DNA (genomic) 
(iii) HYPOTHETICAL : NO 
(iii) ANTI-SENSE: HO 

(v) FRAGMENT TT2S: N -terminal 

(xi) SEQUENCE DESCRIPTION: SEQ ZD NO: 3: 
CCCXGTCATC TCCAGGACAG CTCAGTCTCC GTCCAATCTC 40 
(2) INFORMATION FOR SZQ ID NO: 4i 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 28 base pairs 

(B) TYPE: nucleic acid 

(C) STRAND ED NESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TIRE: DNA (genomic) 
(iii) HYPOTHETICAL: NO 
(iii) ANTI-SENSE: NO 

(v) FRAGMENT TYPE: N-terminal 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 4: 
CTTTCCTGAG TGATGTCTGT GTGCAATG 



28 
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(2) INFORMATION FOR SEQ ID NO: 5: 

(i) SEQUENCE CHARACTERISTICS : 

(A) LENGTH i 35 base pairs 

(B) TYPE: nucleic acid 
(CJ STRANDEDNESS: single 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA ( genomic) 

(iii) HYPOTHETICAL: NO 

(iii) ANTI-SENSE: NO 

(v) FRAGMENT TTPE: N- terminal 



(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 
CCTGGAACTC AAGCTTGAAT TCTCCACAGA GGAGG 
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CLAIMS: 

1. A DNA molecule comprising the carcinoembryonic 
antigen (CEA) transcriptional regulatory sequence (TRS) 
but without associated CEA coding sequence. 

2. A molecular chimaera comprising a CEA TRS and a DNA 
sequence operatively linked thereto encoding a 
heterologous enzyme. 

3. A molecular chimaera according to claim 2 wherein the 
heterologous enzyme is capable of catalysing the 
production of an agent cytoxic or cytostatic to CEA + 
cells. 

4. A molecular chimaera according to claim 3 wherein the 
heterologous enzyme is cytosine deaminase (CD) . 

5. A molecular chimaera according to any of claims 2 to 
4 wherein the CEA TRS and the sequence encoding a 
heterologous enzyme are in an expression cassette. 

6. A molecular chimaera according to claim 5 which 
comprises DNA sequence of the coding sequence of the gene 
coding for the heterologous enzyme and additionally 
includes an appropriate polyadenylation sequence which is 
linked downstream in a 3 » position and in proper 
orientation to the CEA TRS. 

7. A retroviral shuttle vector comprising a molecular 
chimaera according to any of claims 2 to 6. 

8. A retroviral shuttle vector according to claim 7 
comprising a DNA sequence comprising a 5* viral LTR 
sequence, a cis acting psi encapsidation sequence, the 
molecular chimaera and a 3' viral LTR sequence. 
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9. A retroviral shut-tie vector according to claim 8 
based on Moloney murine leukaemia virus. 

10. A retroviral shuttle vector according to any of 
claims 7 to 9 which is a SIN vector. 

11. An infective virion comprising a retroviral shuttle 
vector according to any of claims 7 to 10, the vector 
being encapsidated within viral proteins to create an 
artificial, infective, replication defective, retrovirus. 

12. A packaging cell line comprising a retroviral 
shuttle vector according to any of claims 7 to 10. 

13. A pharmaceutical composition comprising an infective 
virion according to claim 11 or packaging cell line 
according to claim 12 together with a pharmaceutically 
acceptable carrier. 

14. Use of CEA TRS for targeting expression of a 
heterologous enzyme to CEA"** cells. 

15. Use according to claim 14 wherein the heterologous 
enzyme is capable of catalysing the production of an 
agent cytotoxic or cytostatic to CEA + cells. 

16. Use according to claim 15 wherein the heterologous 
enzyme is CD. 

17. A DNA milecule according to claim 1 which comprises 
one or more of the following sequence regions of the CEA 
gene in either orientation: 

about -299b to about +69b, more preferably about -90b to 
about +69b; 

-14.4kb to -I0.6kb, preferably -13.6kb to -I0.6kb; 
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-6. lkb to -3.8kb. 

18. A molecular chimaera according to any of claims 2 to 
6, retroviral shuttle vector according to any claims 7 to 
10, packaging cell line according to claim 12 or 
composition according to claim 13 wherein the CEA TRS 
comprises one or more of the following sequence regions 
of the CEA gene in either orientation: 

about -299b to about +69b, more preferably about -90b to 
about +6 Ob; 

-14.4kb to -10. 6]*, preferably -13.6kb to -10.6kb; 
-6. lkb to -3.8kb. 

19. Use according to any of claims 14 to 16 wherein the 
CEA TRS comprises one or more of the following sequence 
regions of the CEA gene in either orientation: 

about -199b to about +69b, more preferably about -90b to 
about +6 9b; 

-14.4kb to -10.6kb, preferably -13.6kb to -10.6kb; 
-6. lkb to -3.8kb. 
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Plasmid CEA Coordinates 

pCR113 (-299 to +69) 

pCR105 (-1664 to +69) 

pCR145 ( -14462 to -10691 )+ (-299 to + 69 ) 

pCRUB (-89to-40)*(-90to«-69) 

pCR158 [3 X (-89 to -40) W-90 to +69 ) 

pCR136 (-3919 to-6071) + (-299to+69) 

pCR137 (-6071 to -3919) + (-299 to +69) 

pCR162 (-13579 to -10691)+(-89to-40)+l-90to+69) 

pCR163 (-10691 to-13579)+(-89to-40K(-90to+69) 



Fig.BB 
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-14463 AAGCTTTTTA GTGCTTTAGA CAGTGAGCTG GTCTGTCTAA CCCAAGTGAC CTGGGCTC 

-14403 TACTCAGCCC CAGAAGTGAA GGGTGAAGCT GGGTGGAGCC AAACCAGGCA AGCCTACC 

-14343 CAGGGCTCCC AGTGG CCTGA GAACCATTGG ACCCAGGACC CATTACTTCT AGGGTAAG 

-14283 AGGTACAAAC ACCAGATCCA ACCATGGTCT GGGGGGACAG CTGTCAAATG CCTAAAAA 

-14223 TACCTGGGAG AGGAGCAGGC AAACTATCAC TGCCCCAGGT TCTCTGAACA GAAACAGA 

-14163 GGCAACC CAA AGTCCAAATC CAGGTGAGCA GGTGCACCAA ATGCCCAGAG ATATGACG 

-14103 GCAAGAAGTG AAGGAACCAC CCCTGCATCA AATGTTTTGC ATGGGAAGGA GAAGGGGG 

-14043 GCTCATGTTC CCAATCCAGG AGAATGCATT TGGGATCTGC CTTCTTCTCA CTCCTTGG 

-13983 AGCAAGACTA AGCAACCAGG ACTCTGGATT TGGGGAAAGA CGTTTATTTG TGGAGGCC 

-13923 TGATGACAAT CCCACGAGGG CCTAGGTGAA GAGGGCAGGA AGGCTCGAGA CACTGGGG 

-13863 TGAGTGAAAA CCACACCCAT GATCTGCACC ACCCATGGAT GCTCCTTCAT TGCTCACC 

-13803 TCTGTTGATA TCAGATGGCC CCATTTTCTG TACCTTCACA GAAGGACACA GGCTAGGG 

-13743 TGTGCATGGC CTTCATCCCC GGGGCCATGT GAGGACAGCA GGTGGGAAAG ATCATGGG 

-13683 CTCCTGGGTC CTGCAGGGCC AGAACATTCA TCACCCATAC TGACCTCCTA GATGGGAA 

-13 623 GCTTCCCTGG GGCTGGGCCA ACGGGGCCTG GGCAGGGGAG AAAGGACGTC AGGGGACA 

-13563 GAGGAAGGGT CATCGAGACC CAGCCTGGAA GGTTCTTGTC TCTGACCATC CAGGATTT 

-13503 TTCCCTGCAT CTACCTTTGG TCATTTTCCC TCAGCAATGA CCAGCTCTGC TTCCTGAT 

-13443 CAGCCTCCCA CCCTGGACAC AGCACCCCAG TCCCTGGCCC GGCTGCATCC ACCCAATA 

-13383 CTGATAACCC AGGACCCATT ACTTCTAGGG TAAGGAGGGT CCAGGAGACA GAAGCTGA 

-13323 AAAGGTCTGA AGAAGTCACA TCTGTCCTGG CCAGAGGGGA AAAACCATCA GATGCTGA 

-13263 CAGGAGAATG TTGACCCAGG AAAGGGACCG AGGACCCAAG AAAGGAGTCA GACCACCA 

-13203 GTTTGCCTGA GAGGAAGGAT CAAGGCCCCG AGGGAAAGCA GGGCTGGCTG CATGTGCA 
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-13143 ACACTGGTGG GGCATATGTG TCTTAGATTC TCCCTGAATT CAGTGTCCCT GCCATGGC 
-13083 GACTCTCTAC TCAGGCCTGG ACATGCTGAA ATAGGACAAT GGCCTTGTCC TCTCTCCC 
-13023 CCATTTGGCA AGAGACATAA AGGACATTCC AGGACATGCC TTCCTGGGAG GTCCAGGT 



-12903 CCAGCCTGTC CAGGCCCTTC CACTCTCCAC TTTGTGACCA TGTCCAGGAC CACCCCTC 
-12843 ATCCTGAGCC TGCAAATACC CCCTTGCTGG GTGGGTGGAT TCAGTAAACA GTGAGCTC 
-12783 ATCCAGCCCC CAGAGCCACC TCTGTCACCT TCCTGCTGGG CATCATCCCA CCTTCACA 



-12663 CCCAGGTTCG GATTCCCAGG GCTCTGTATG TGGAGCTGAC AGACCTGAGG CCAGGAGA 

-12603 GCAGAGGTCA GCCCTAGGGA GGGTGGGTCA TCCACCCAGG GGACAGGGGT GCACCAGC 

-12543 TGCTACTGAA AGGGCCTCCC CAGGACAGCG CCATCAGCCC TGCCTGAGAG CTTTGCTA 

-12483 CAGCAGTCAG AGGAGGCCAT GGCAGTGGCT GAGCTCCTGC TCCAGGCCCC AACAGACC 

-12423 ACCAACAGCA CAATGCAGTC CTTCCCCAAC GTCACAGGTC ACCAAAGGGA AACTGAGG 

-12363 CTACCTAACC TTAGAGCCAT CAGGGGAGAT AACAGCCCAA TTTCCCAAAC AGGCCAGT 

-123 03 CAATCCCATG ACAATGACCT CTCTGCTCTC ATTCTTCCCA AAATAGGACG CTGATTCT 

-12243 CCCACCATGG ATTTCTCCCT TGTCCCGGGA GCCTTTTCTG CCCCCTATGA TCTGGGCA 

-12183 CCTGACACAC ACCTCCTCTC TGGTGACATA TCAGGGTCCC TCACTGTCAA GCAGTCCA 

-12123 AAGGACAGAA CCTTGGACAG CGCCCATCTC AGCTTCACCC TTCCTCCTTC ACAGGGTT 

-12063 GGGCAAAGAA TAAATGGCAG AGGCCAGTGA GCCCAGAGAT GGTGACAGGC AGTGACCC 

-12003 GGGCAGATGC CTGGAG CAGG AGCTGGCGGG GCCACAGGGA GAAGGTGATG CAGGAAGG 

-11943 AAC C CAG AAA TGGGCAGGAA AGGAGGACAC AGGCTCTGTG GGGCTGCAGC CCAGGGTT 

-11883 ACTATGAGTG TGAAGCCATC TCAGCAAGTA AGGCCAGGTC CCATGAACAA GAGTGGGA 

-11823 ACGTGGCTTC CTGCTCTGTA TATGGGGTGG GGGATTCCAT GCCCCATAGA ACCAGATG 



-12963 



ACCTCAGGGA CTGTAGTTAC TGCATCAGCC ATGGTAGGTG CTGATCTC 



-12723 



CACTAAAGAG CATGGGGAGA CCTGGCTAGC 



CATCACAAAG AAAATAAT 




suBsmum sheet (rule 26) 



WO 95/14100 



PCT/GB94/02546 



12/20 



-11763 




ATGGAGAAGG 


Aw WiU kjAwiu 


WjuA J. ULUUA 


nn ft t ft rzn ft na 


ACCCCAGT 




CCCCACCCAG 


^jUibVa 1 tj/VU 1 




CATGCAGGGT 


CCTCCTGGGC 


TGGGCTCT 






CAGG AT T C CT 


TGAAGGAACA 


TCCGGAAGCC 


GACCACATCT 


ACCTGGTG 






TCCATGTAAA 


GCCAGGAGCT 


TGTGTTGCTA 


GGAGGGGTCA 


TGGCATGT 




TGGGGGCACC 


AAAGAGAGAA 


ACCTG AGGG C 


AGG CAGG ACC 


TGGTCTGAGG 


AGGCATGG 




GCCCAGATGG 


GGAGATGGAT 


GTCAGGAAAG 


GCTGCCCCAT 


CAGGGAGGGT 


GATAGCAA 




GGGGGTCTGT 


GGGAGTGGGC 


ACGTGGGATT 


CCCTGGGCTC 


TGCCAAGTTC 


CCTCCCAT 




TCACAACCTG 


GGGACACTGC 


CCATGAAGGG 


GCGCCTTTGC 


CCAGCCAGAT 


GCTGCTGG 




CTGCCCATCC 


ACTACCCTCT 


CTGCTCCAGC 


CACTCTGGGT 


CTTTCTCCAG 


ATGCCCTG 




CAGCCCTGGC 


CTGGGCCTGT 


CCCCTGAGAG 


GTGTTGGGAG 


AAGCTGAGTC 


TCTGGGGA 




CTCTCATCAG 


AGTCTGAAAG 


GCACATCAGG 


AAACATCCCT 


GGTCTCCAGG 


ACTAGGCA 




GAGGAAAGGG 


CCCCAGCTCC . 


TCCCTTTGCC 


ACTGAGAGGG 


TCGACCCTGG 


GTGGCCAC 




TGACTTCTGC 


GTCTGTCCCA 


GTCACCCTGA 


AACCACAACA 


AAACCCCAGC 


CCCAGACC 


-10983 


GCAGGTACAA 


TACATGTGGG 


GACAGTCTGT 


ACCCAGGGGA 


AGCCAGTTCT 


CTCTTCCT 


-10923 


GAGACCGGGC 


CTCAGGGCTG 


TGCCCGGGGC 


AGGCGGGGGC 


AGCACGTGCC 


TGTCCTTG 


-10863 


AACTCGGGAC 


CTTAAGGGTC 


TCTGCTCTGT 


GAGGCACAGC 


AAGGATCCTT 


CTGTCCAG 


-10803 


ATGAAAGCAG 




TCCTCTGACC 


TCTTCCTCCT 


TCCCAAATCT 


CAACCAAC 


-10743 


ATAGGTGTTT 


CAAATCTCAT 


CATCAAATCT 


TCATCCATCC 


ACATGAGAAA 


GCTTAAAA 


-10683 


CAATGGATTG 


ACAACATCAA 


GAG TTGGAAC 


AAGTGGACAT 


GGAGATGTTA 


CTTGTGGA 


-10623 


TTTAGATGTG 


TTCAGCTATC 


GGGCAGGAGA 


ATCTGTGTCA 


AATTCCAGCA 


TGGTTCAG 


-10S63 


GAATCAAAAA 


GTGTCACAGT 


CCAAATGTGC 


AACAGTGCAG 


GGGATAAAAC 


TGTGGTGC 


-10S03 


TCAAACTGAG 


GGATATTTTG 


GAACATGAGA 


AAGGAAGGGA 


TTGCTGCTGC 


ACAGAACA 


-10443 


GATGATCTCA 


CACATAGAGT 


TGAAAGAAAG 


GAGTCAATCG 


CAGAATAGAA 


AATGATCA 



Fig. 6 (3/11) 

SUBSTITUTE SHEET (RULE 26) 



WO 95/14100 



PCT/GB94/02546 



13/20 



-10383 AATTCCACCT CTATAAAGTT TCCAAGAGGA AAACCCAATT CTGCTGCTAG AGATCAGA 

-10323 GGAGGTGACC TGTGCCTTGC AATGGCTGTG AGGGTCACGG GAGTGTCACT TAGTGCAG 

-10263 AATGTGCCGT ATCTTAATCT GGGCAGGGCT TTCATGAGCA CATAGGAATG CAGACATT 

-10203 TGCTGTGTTC ATTTTACTTC ACCGGAAAAG AAGAATAAAA TCAGCCGGGC GCGGTGGC 

-10143 ACGCCTGTAA TCCCAGCACT TTAGAAGGCT GAGGTGGGCA GATTACTTGA GGTCAGGA 

-10083 TCAAGACCAC CCTGGCCAAT ATGGTGAAAC CCCGGCTCTA CTAAAAATAC AAAAATTA 

-10023 TGGGCATGGT GGTGCGCGCC TGTAATCCCA GCTACTCGGG AGGCTGAGGC TGGACAAT 

-9963 CTTGGACCCA GGAAGCAGAG GTTGCAGTGA GCCAAGATTG TGCCACTGCA CTCCAGCT 

-9903 GGCAACAGAG CCAGACTCTG TAAAAAAAAA AAAAAAAAAA AAAAAAAGAA AGAAAGAA 

-9843 AGAAAAGAAA GTATAAAATC TCTTTGGGTT AACAAAAAAA GATCCACAAA ACAAACAC 

-9783 GCTCTTATCA AACTTACACA ACTCTGCCAG AGAACAGGAA ACACAAATAC TCATTAAC 

-9723 ACTTTTGTGG CAATAAAACC TTCATGTCAA AAGGAGACCA GGACACAATG AGGAAGTA 

-9663 ACTGCAGGCC CTACTTGGGT GCAGAGAGGG AAAATCCACA AATAAAACAT TACCAGAA 

-9603 AGCTAAGATT TACTGCATTG AGTTCATTCC CCAGGTATGC AAGGTGATTT TAACACCT 

-9543 AAATCAATCA TTGCCTTTAC TACATAGACA GATTAGCTAG AAAAAAATTA CAACTAGC 

-9483 AACAGAAGCA ATTTGGCCTT CCTAAAATTC CACAT CATAT CATCATGATG GAGACAGT 

' -9423 AGACGCCAAT GACAATAAAA AGAGGGACCT CCGTCACCCG GTAAACATGT CCACACAG 

-9363 CCAGCAAGCA CCCGTCTTCC CAGTGAATCA CTGTAACCTC CCCTTTAATC AGCCCCAG 

-9303 AAGGCTGCCT GCGATGGCCA CACAGGCTCC AACCCGTGGG CCTCAACCTC CCGCAGAG 

-9243 TCTCCTTTGG CCACCCCATG GGGAGAGCAT GAGGACAGGG CAGAGCCCTC TGATGCCC 

-9183 ACATGGCAGG AGCTGACGCC AGAGCCATGG GGGCTGGAGA GCAGAGCTGC TGGGGTCA 

-9123 GCTTCCTGAG GACACCCAGG CCTAAGGGAA GGCAGCTCCC TGGATGGGGG CAACCAGG 

-9063 CCGGGCTCCA ACCTCAGAGC CCGCATGGGA GGAGCCAGCA CTCTAGGCCT TTCCTAGG 
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-9003 GACTCTGAGG GGACCCTGAC ACGACAGGAT CGCTGAATGC ACCCGAGATG AAGGGGCC 
-8943 CACGGGACCC TGCTCTCGTG GCAGATCAGG AGAGAGTGGG ACACCATGCC AGGCCCCC 
-8883 GGCATGGCTG CGACTGACCC AGGCCACTCC CCTGCATGCA TCAGCCTCGG TAAGTCAC 



-8763 AAGGTCAGTG CAAAGAGAGG CCATGAGCAG TTAGGAAGGG TGGTCCAACC TACAGCAC 

-8703 ACCATCATCT ATCATAAGTA GAAGCCCTGC TCCATGACCC CTGCATTTAA ATAAACGT 

-8643 GTTAAATGAG TCAAATTCCC TCACCATGAG AGCTCACCTG TGTGTAGGCC CATCACAC 

-85 8 3 ACAAACACAC ACACACACAC ACACACACAC ACACACACAC ACAGGGAAAG TGCAGGAT 

-8S23 TGGACAGCAC CAGGCAGGCT TCACAGGCAG AGCAAACAGC GTGAATGACC CATGCAGT 

-8463 CCTGGGCCCC ATCAGCTCAG AGACCCTGTG AGGGCTGAGA TGGGGCTAGG CAGGGGAG 

-8403 ACTTAGAGAG GGTGGGGCCT CCAGGGAGGG GGCTGCAGGG AGCTGGGTAC TGCCCTCC 

-8343 GGAGGGGGCT GCAGGGAGCT GGGTACTGCC CTCCAGGGAG GGGGCTGCAG GGASCTGG 

-8283 ACTGCCCTCC AGGGAGGGGG CTGCAGGGAG CTGGGTACTG CCCTCCAGGG AGGGGGCT 

-8223 AGGGAGCTGG GTACTGCCCT CCAGGGAGGC AGGAGCACTG TTCCCAACAG AGAGCACA 

-8163 TTCCTGCAGC AGCTGCACAG ACACAGGAGC CCCCATGACT GCCCTGGGCC AGGGTGTG 

-8103 TTCCAAATTT CGTGCCCCAT TGGGTGGGAC GGAGGTTGAC CGTGACATCC AAGGGGCA 

-8043 TGTGATTCCA AACTTAAACT ACTGTGCCTA CAAAATAGGA AATAACCCTA CTTTTTCT 

-7983 TATCTCAAAT TCCCTAAGCA CAAGCTAGCA CCCTTTAAAT CAGGAAGTTC AGTCACTC 

-7923 GGGGTCCTCC CATGCCCCCA GTCTGACTTG CAGGTGCACA GGGTGGCTGA CATCTGTC 

-7863 TGCTCCTCCT CTTGGCTCAA CTGCCGCCCC TCCTGGGGGT GACTGATGGT CAGGACAA 

-7803 GATCCTAGAG CTGGCCCCAT GATTGACAGG AAGGCAGGAC TTGGCCTCCA TTCTGAAG 

-7743 TAGGGGTGTC AAGAGAGCTG GGCATCCCAC AGAGCTGCAC AAGATGACGC GGACAGAG 

-7683 TGACACAGGG CTCAGGGCTT CAGACGGGTC GGGAGGCTCA GCTGAGAGTT CAGGGACA 



-8823 



GACCAAGCCC AGGACCAATG TGGAAGGAAG GAAACAGCAT CCCCTTTAGT GATGGAAC 
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-7623 CCTGAGGAGC CTCAGTGGGA AAAGAAGCAC TGAAGTGGGA AGTTCTGGAA TGTTCTGG 
-7563 AAGCCTGAGT GCTCTAAGGA AATGCTCCCA CCCCGATGTA GCCTGCAGCA CTGGACGG 
-7503 TGTGTACCTC CCCGCTGCCC ATCCTCTCAC AGCCCCCGCC TCTAGGGACA CAACTCCT 



-73 B 3 CATTGCAAGG AGTGGAGGTC ACGTTCCCAC AGACCACCCA GCAACAGGGT CCTATGGA 

-7323 TGCGGTCAGG AGGATCACAC GTCCCCCCAT GCCCAGGGGA CTGACTCTGG GGGTGATG 

-7263 TTGGCCTGGA GGCCACTGGT CCCCTCTGTC CCTGAGGGGA ATCTGCACCC TGGAGGCT 

-7203 CACATCCCTC CTGATTCTTT CAGCTGAGGG CCCTTCTTGA AATCCCAGGG AGGACTCA 

-7143 CCCCACTGGG AAAGGCCCAG TGTGGACGGT TCCACAGCAG CCCAGCTAAG GCCCTTGG 

-7083 ACAGATCCTG AGTGAGAGAA CCTTTAGGGA CACAGGTGCA CGGCCATGTC CCCAGTGC 

-7023 ACACAGAGCA GGGGCATCTG GACCCTGAGT GTGTAGCTCC CGCGACTGAA CCCAGCCC 

-6963 CCCCAATGAC GTGACCCCTG GGGTGGCTCC AGGTCTCCAG TCCATGCCAC CAAAATCT 

-6903 AGATTGAGGG TCCTCCCTTG AGTCCCTGAT GCCTGTCCAG GAGCTGCCCC CTGAGCAA 

-6843 CTAGAGTGCA GAGGGCTGGG ATTGTGGCAG TAAAAGCAGC CACATTTGTC TCAGGAAG 

-6783 AAGGGAGGAC ATGAGCTCCA GGAAGGGCGA TGGCGTCCTC TAGTGGGCGC CTCCTGTT 

-6723 TGAGCAAAAA GGGGCCAGGA GAGTTGAGAG ATCAGGGCTG GCCTTGGACT AAGGCTCA 

-6663 TGGAGAGGAC TGAGGTGCAA AGAGGGGGCT GAAGTAGGGG AGTGGTCGGG AGAGATGG 

-6603 GGAGCAGGTA AGGGGAAGCC CCAGGGAGGC CGGGGGAGGG TACAGCAGAG CTCTCCAC 

-6543 CTCAGCATTG ACATTTGGGG TGGTCGTGCT AGTGGGGTTC TGTAAGTTGT AGGGTGTT 

-6483 GCACCATCTG GGGACTCTAC CCACTAAATG CCAGCAGGAC TCCCTCCCCA AGCTCTAA 

-6423 ACCAACAATG TCTCCAGACT TTCCAAATGT CCCCTGGAGA GCAAAATTGC TTCTGGCA 

-6363 ATCACTGATC TACGTCAGTC TCTAAAAGTG ACTCATCAGC GAAATCCTTC ACCTCTTG 

-6303 AGAAGAATCA CAAGTGTGAG AGGGGTAGAA ACTGCAGACT TCAAAATCTT TCCAAAAG 



-7443 



CCTAACATGC ATCTTTCCTG TCTCATTCCA CACAAAAGGG 



CCCTGTTC 
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-€243 TTTTACTTAA TCAGCAGTTT GATGTCCCAG GAGAAGATAC ATTTAGAGTG TTTAGAGT 

-6183 ATGCCACATG GCTGCCTGTA CCTCACAGCA GGAGCAGAGT GGGTTTTCCA AGGGCCTG 

-6123 ACCACAACTG GAATGACACT CACTGGGTTA CATTACAAAG TGGAATGTGG GGAATTCT 

-6063 AGACTTTGGG AAGGGAAATG TATGACGTGA GCCCACAGCC TAAGGCAGTG GACAGTCC 

-6003 TTTGAGGCTC TCACCATCTA GGAGACATCT CAGCCATGAA CATAGCCACA TCTGTCAT 

-5943 GAAAACATGT TTTATTAAGA GGAAAAATCT AGGCTAGAAG TGCTTTATGC TCTTTTTT 

-5883 CTTTATGTTC AAATTCATAT ACTTTTAGAT CATTCCTTAA AGAAGAATCT ATCCCCCT 

-5823 GTAAATGTTA TCACTGACTG GATAGTGTTG GTGTCTCACT CCCAACCCCT GTGTGGTG 

-5763 AGTGCCCTGC TTCCCCAGCC CTGGGCCCTC TCTGATTCCT GAGAGCTTTG GGTGCTCC 

-5703 CATTAGGAGG AAGAGAGGAA GGGTGTTTTT AATATTCTCA CCATTCACCC ATCCACCT 

-5643 TAGACACTGG GAAGAATCAG TTGCCCACTC TTGGATTTGA TCCTCGAATT AATGACCT 

-5583 ATTTCTGTCC CTTGTCCATT TCAACAATGT GACAGGCCTA AGAGGTGCCT TCTCCATG 

-5523 ATTTTTGAGG AGAAGGTTCT CAAGATAAGT TTTCTCACAC CTCTTTGAAT TACCTCCA 

-5463 TGTGTCCCCA TCACCATTAC CAGCAGCATT TGGACCCTTT TTCTGTTAGT CAGATGCT 

-5403 CCACCTCTTG AGGGTGTATA CTGTATGCTC TCTACACAGG AATATGCAGA GGAAATAG 

-5343 AARGGGAAAT CGCATTACTA TTCAGAGAGA AGAAGACCTT TATGTGAATG AATGAGAG 

-5283 TAAAATCCTA AGAGAGCCCA TATAAAATTA TTACCAGTGC TAAAACTACA AAAGTTAC 

-5223 TAACAGTAAA CTAGAATAAT AAAACATGCA TCACAGTTGC TGGTAAAGCT AAATCAGA 

-5163 T T TTTTTCTT AGAAAAAGCA TTCCATGTGT GTTGCAGTGA TGACAGGAGT GCCCTTCA 

-5103 CAATATGCTG CCTGTAATTT TTGTTCCCTG GCAGAATGTA TTGTCTTTTC TCCCTTTA 

-5043 TCTTAAATGC AAAACTAAAG GCAGCTCCTG GGCCCCCTCC CCAAAGTCAG CTGCCTGC 

-4983 CCAGCCCCAC GAAGAGCAGA GGCCTGAGCT TCCCTGGTCA AAATAGGGGG CTAGGGAG 

-4923 TAACCTTGCT CGATAAAGCT GTGTTCCCAG AATGTCGCTC CTGTTCCCAG GGGCACCA 
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-4863 CTGGAGGGTG GTGAGCCTCA CTGGTGGCCT GATGCTTACC TTGTGCCCTC ACACCAGT 



-4743 AGTCAAATTA CTGCCCACTT CTCCAGGGCA GATACCTGTG AACATCCAAA ACCATGCC 

-4683 AGAACCCTGC CTGGGGTCTA CAACACATAT GGACTGTGAG CACCAAGTCC AGCCCTGA 

-4623 CTGTGACCAC CTGCCAAGAT GCCCCTAACT GGGATCCACC AATCACTGCA CATGGCAG 

-4563 AGCGAGGCTT GGAGGTGCTT CGCCACAAGG CAGCCCCAAT TTGCTGGGAG TTTCTTGG 

-4503 CCTGGTAGTG GTGAGGAGCC TTGGGACCCT CAGGATTACT CCCCTTAAGC ATAGTGGG 

-4443 CCCTTCTGCA TCCCGAGCAG GTGCCCCGCT CTTCAGAGCC TCTCTCTCTG AGGTTTAC 

-4383 AGACCCCTGC ACCAATGAGA CCATGCTGAA GCCTCAGAGA GAGAGATGGA GCTTTGAC 

-4323 GGAGCCGCTC TTCCTTGAGG GCCAGGGCAG GGAAAGCAGG AGGCAGCACC AGGAGTGG 

-4263 ACACCAGTGT CTAAGCCCCT GATGAGAACA GGGTGGTCTC TCCCATATGC CCATACCA 

-4203 CCTGTGAACA GAATCCTCCT TCTGCAGTGA CAATGTCTGA GAGGACGACA TGTTTCCC 

-4143 CCTAACGTGC AGCCATGCCC ATCTACCCAC TGCCTACTGC AGGACAGCAC CAACCCAG 

-4083 GCTGGGAAGC TGGGAGAAGA CATGGAATAC CCATGGCTTC TCACCTTCCT CCAGTCCA 

-4023 GGGCACCATT TATG CCTAGG ACACCCACCT GCCGGCCCCA GGCTCTTAAG AGTTAGGT 

-3963 CCTAGGTGCC TCTGGGAGGC CGAGGCAGGA GAATTGCTTG AACCCGGGAG GCAGAGGT 

-3903 CAGTGAGCCG AGATCACACC ACTGCACTCC AGCCTGGGTG ACAGAATGAG ACTCTGTC 

-3843 AAAAAAAAAG AGAAAGATAG CATCAGTGGC TACCAAGGGC TAGGGGCAGG GGAAGGTG 

-3783 GAGTTAATGA TTAATAGTAT GAAGTTTCTA TGTGAGATGA TGAAAATGTT CTGGAAAA 

-3723 AAATATAGTG GTGAGGATGT AGAATATTGT GAATATAATT AACGGCATTT AATTGTAC 

-3663 TTAACATGAT TAATGTGGCA TATTTTATCT TATGTATTTG ACTACATCCA AGAAACAC 

-3603 GGAGAGGGAA AGCCCACCAT GTAAAATACA CCCACCCTAA TCAGATAGTC CTCATTGT 

-3543 CCAGGTACAG GCCCCTCATG ACCTGCACAG GAATAACTAA GGATTTAAGG ACATGAGG 



-4803 



TCACTGGAAC 



CTTGAACACT TGGCTGTCGC CCGGATCTGC AGATGTCAAG AACTTCTG 
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-3483 TCCCAGCCAA CTGCAGGTGC ACAACATAAA TGTATCTGCA AACAGACTGA GAGTAAAG 

-3423 GGGGGCACAA ACCTCAGCAC TGCCAGGACA CACACCCTTC TCGTGGATTC TGACTTTA 

-3363 TGACCCGGCC CACTGTCCAG ATCTTGTTGT GGGATTGGGA CAAGGGAGGT CATAAAGC 

-3303 GTCCCCAGGG CACTCTGTGT GAGCACACGA GACCTCCCCA CCCCCCCACC GTTAGGTC 

-3243 CACACATAGA TCTGACCATT AGGCATTGTG AGGAGGACTC TAGCG CGGGC TCAGGGAT 

-3183 CACCAGAGAA TCAGGTACAG AGAGGAAGAC GGGGCTCGAG GAGCTGATGG ATGACACA 

-3123 GCAGGGTTCC TGCAGTCCAC AGGTCCAGCT CACCCTGGTG TAGGTGCCCC ATCCCCCT 

-3063 TCCAGGCATC CCTGACACAG CTCCCTCCCG GAGCCTCCTC CCAGGTGACA CATCAGGG 

-3003 CCTCACTCAA GCTGTCCAGA GAGGGCAGCA CCTTGGACAG CGCCCACCCC ACTTCACT 

-2943 TCCTCCCTCA CAGGGCTCAG GGCTCAGGGC TCAAGTCTCA GAACAAATGG CAGAGGCC 

-2883 TGAGCCCAGA GATGGTGACA GGGCAATGAT CCAGGGGCAG CTGCCTGAAA CGGGAGCA 

-2823 TGAAG CCACA GATGGGAGAA GATGGTTCAG GAAGAAAAAT CCAGGAATGG GCAGGAGA 

-2763 AGAGGAGGAC ACAGGCTCTG TGGGGCTGCA GCCCAGGATG GGACTAAGTG TGAAGACA 

-2703 TCAGCAGGTG AGGCCAGGTC CCATGAACAG AGAAGCAGCT CCCACCTCCC CTGATGCA 

-2643 GACACACAGA GTGTGTGGTG CTGTGCCCCC AGAGTCGGGC TCTCCTGTTC TGGTCCCC 

-2583 GGAGTGAGAA GTGAGGTTGA CTTGTCCCTG CTCCTCTCTG CTACCCCAAC ATTCACCT 

-2523 TCCTCATGCC CCTCTCTCTC AAATATGATT TGGATCTATG TCCCCGCCCA AATCTCAT 

-2463 CAAATTGTAA ACCCCAATGT TGGAGGTGGG GCCTTGTGAG AAGTGATTGG ATAATGCG 

-2403 TGGATTTTCT GCTTTGATGC TGTTTCTGTG AT AGAG AT CT CACATGATCT GGTTGTTT 

-2343 AAGTGTGTAG CACCTCTCCC CTCTCTCTCT CTCTCTCTTA CTCATGCTCT GCCATGTA 

-2283 ACGTTCCTGT TTCCCCTTCA CCGTCCAGAA TGATTGTAAG TTTTCTGAGG CCTCCCCA 

-2223 AGCAGAAGCC ACTATGCTTC CTGTACAACT GCAGAATGAT GAGCGAATTA AACCTCTT 

-2163 CTTTATAAAT TACCCAGTCT CAGGTATTTC TTTATAGCAA TGCGAGGACA GACTAATA 
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-2103 ATCTTCTACT CCCAGATCCC CGCACACGCT TAGCCCCAGA CATCACTGCC CCTGGGAG 
-2043 TGCACAGCGC AGCCTCCTGC CGACAAAAGC AAAGTCACAA AAGGTGACAA AAATCTGC 
-1983 TTGGGGACAT CTGATTGTGA AAGAGGGAGG ACAGTACACT TGTAGCCACA GAGACTGG 



-1863 TCTAGAGATC AGTGTTGAGT AAAACAGCCT GGTCTGGGGC CGCTGCTGTC CCCACTTC 

-1803 TCCTGTCCAC CAGAGGGCGG CAGAGTTCCT CCCACCCTGG AGCCTCCCCA GGGGCTGC 

-1743 ACCTCCCTCA GCCGGGCCCA CAGCCCAGCA GGGTCCACCC TCACCCGGGT CACCTCGG 

-1683 CACGTCCTCC TCGCCCTCCG AGCTCCTCAC ACGGACTCTG TCAGCTCCTC CCTGCAGC 

-1623 ATCGGCCGCC CACCTGAGGC TTGTCGGCCG CCCACTTGAG GCCTGTCGGC TGCCCTCT 

-1563 AGGCAGCTCC TGTCCCCTAC ACCCCCTCCT TCCCCGGGCT CAGCTGAAAG GGCGTCTC 

-1503 AGGGCAGCTC CCTGTGATCT CCAGGACAGC TCAGTCTCTC ACAGGCTCCG ACGCCCCC 

-1443 TGCTGTCACC TCACAGCCCT GTCATTACCA TTAACTCCTC AGTCCCATGA AGTTCACT 

-1383 GCGCCTGTCT CCCGGTTACA GGAAAACTCT GTGACAGGGA CCACGTCTGT CCTGCTCT 

-1323 GTGGAATCCC AGGGCCCAGC CCAGTGCCTG ACACGGAACA GATGCTCCAT AAATACTG 

-1263 TAAATGTGTG GGAGATCTCT AAAAAGAAGC ATATCACCTC CGTGTGGCCC CCAGCAGT 

-1203 GAGTCTGTTC CATGTGGACA CAGGGGCACT GGCACCAGCA TGGGAGGAGG CCAGCAAG 

-1143 CCCGCGGCTG CCCCAGGAAT GAGGCCTCAA CCCCCAGAGC TTCAGAAGGG AGGACAGA 

-1083 CCTGCAGGGA ATAGATCCTC CGGCCTGACC CTGCAGCCTA ATCCAGAGTT CAGGGTCA 

-1023 TCACACCACG TCGACCCTGG TCAGCATCCC TAGGG CAGTT CCAGACAAGG CCGGAGGT 

-963 CCTCTTGCCC TCCAGGGGGT GACATTGCAC ACAGACATCA CTCAGGAAAC GGATTCCC 

-903 GGACAGGAAC CTGGCTTTGC TAAGGAAGTG GAGGTGGAGC CTGGTTTCCA TCCCTTGC 

-843 CAACAGACCC TTCTGATCTC TCCCACATAC CTGCTCTGTT CCTTTCTGGG TCCTATGA 

-783 ACCCTGTTCT GCCAGGGGTC CCTGTGCAAC TCCAGACTCC CTCCTGGTAC CACCATGG 



-1923 



CTCACCGAGC TGAAACCTGG TAGCACTTTG 



GCATAACATG TGCATGACCC GTGTTCAA 
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-723 AAGGTGGGGT GATCACAGGA CAGTCAGCCT CGCAGAGACA GAGACCACCC AGGACTGT 

-6 S3 GGGAGAACAT GGACAGGCCC TGAGCCGCAG CTCAGCCAAC AGACACGGAG AGGGAGGG 

-603 CCCCTGGAGC CTTCCCCAAG GACAGCAGAG CCCAGAGTCA CCCACCTCCC TCCACCAC 

-543 TCCTCTCTTT CCAGGACACA CAAGACACCT CCCCCTCCAC ATGCAGGATC TGGGGACT 

-483 TGAGACCTCT GGGCCTGGGT CTCCATCCCT GGGTCAGTGG CGGGGTTGGT GGTACTGG 

-423 ACAGAGG G CT GGTCCCTCCC CAGCCACCAC CCAGTGAGCC TTTTTCTAGC CCCCAGAG 

-363 ACCTCTGTCA CCTTCCTGTT GGGCATCATC CCACCTTCCC AGAGCCCTGG AGAGCATG 

-303 GAGACCCGGG ACCCTGCTGG GTTTCTCTGT CACAAAGGAA AATAATCCCC CTGGTGTG 

-243 AGACCCAAGG ACAGAACACA GCAGAGGTCA GCACTGGGGA AGACAGGTTG TCCTCCCA 

-183 GGATGGGGGT CCATCCACCT TGCCGAAAAG ATTTGTCTGA GGAACTGAAA ATAGAAGG 

-123 AAAAAGAGGA GGGACAAAAG AGGCAGAAAT GAGAGGGGAG GGGACAGAGG ACACCTGA 

-63 AAAGACCACA CCCATGACCC ACGTGATGCT GAGAAGTACT CCTGCCCTAG GAAGAGAC 

-3 AGGGCAGAGG GAGGAAGGAC AGCAGACCAG ACAGTCACAG CAGCCTTGAC AAAACGTT 

57 TGGAACTCAA GCTCTTCTCC ACAGAGGAGG ACAGAGCAGA CAGCAGAGAC CATGGAGT 

117 CCCTCGGCCC CTCCCCACAG ATGGTGCATC CCCTGGCAGA GGCTCCTGCT CACAGGTG 

177 GGGAGGACAA CCTGGGAGAG GGTGGGAGGA GGGAGCTGGG GTCTCCTGGG TAGGACAG 

237 CTGTGAGACG GACAGAGGGC TCCTGTTGGA GCCTGAATAG GGAAGAGGAC ATCAGAGA 

297 GACAGGAGTC ACAC CAG AAA AATCAAATTG AACTGGAATT GGAAAGGGGC AGGAAAAC 

357 CAAGAGTTCT ATTTTCCTAG TTAATTGTCA CTGGCCACTA CGTTTTTAAA AATCATAA 

417 ACTGGATCAG ATGACACTTT AAATAAAAAC ATAACCAGGG CATGAAACAC TGTCCTCA 

477 CGCCTACCGC GGACATTGGA AAATAAGCCC CAGGCTGTGG AGGGCCCTGG GAACCCTC 

537 GAACTCATCC ACAGGAATCT GCAGCCTGTC CCAGGCACTG GGGTGCAACC AAGATC 

Fig. 6 (11/11) 
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